Searchitect Developer Documentation

Description of the Searchitect Framework:

This framework is used to simplify the usage of searchable encryption (SE) technology and enables the integration of new SE schemes.

General description of SE

A SE scheme enables a server to search over an encrypted database on behalf of a client without revealing the content to the server. A SE scheme provides 3 protocols:

  1. Setup - First the client is indexing a document collection contained in a directory. This plaintext index gets encrypted by a specific implementation of an encryption scheme and uploaded to the server.
  2. Search - After Setup the client is able to search over the data by passing the keyword to the search protocol, which computes a search token which is sent to the server. This search Token enables the server to search over the encrypted data and return the resulting document matches. In resource hiding schemes these are encrypted and therfore a second Resolve procedure at the client is needed to decrypt document identifiers.
  3. Update - Dynamic schemes support a update of the documents contained in the encrypted index.

Searchitect Framework Architecture and Design

  • Client/server architecture based on microservices
  • SOA (service oriented architecture) based on RESTful webservice

The server part of the framework is designed in a service-oriented architecture (SOA) as a web service. In order to address the requirements of an easy integration of new SE schemes and the state of art client/server architecture we chose a microservices design. The basic framework consists of three entities a client, a gateway and a backend template, which holds a plaintext dummy implementation. The authentication of client queries is managed by the SE gateway which maintains a small database of user account information such as credentials and metadata of the repositories used for addressing the EDBs in the backend module. The authentication of each client request is validated at the SE gateway and in case of a successful authentication forwarded to the corresponding backend module of the query. The backend module are the server side instantiations of different implemented schemes and hold the scheme specific encrypted index data structures or databases (EDBs). All interfaces in between the microservices are designed in a REpresentational State Transfer (REST) manner allowing interoperability and loosely coupled systems. Compared to remote procedure calls (RPC) or remote method invocation (RMI) which are tightly coupled. Another benefit of RESTful services is that they are easy extendible and language independent. For the serialization of the exchanged content between the instances JavaScript Object Notation (JSON) is used and the communication between client and gateway is encrypted using Transport Layer Security (TLS).

Software Requirements:

Functional Requirements for the Client:

  • Unauthenticated client operations:
    • create a user stating username and password
    • authenticate using username and password and performs operations on her account and EDBs
  • Authenticated client operations:

    • initialize a EDB stating a path to a file collection and a password
    • uppdate a EDB stating a path to a file collection
    • search over an EDB stating a query and the EDB
    • delete an EDB
    • retrieve all account information of the user
    • delete her user account

      Non-Functional Requirements for the Framework:

    • User management for authentication and EDB access

    • Secure communication - authentication and encryption

    • Scalable - easy integration of new schemes

    • Openess - no dependency on programming language

    • Easy to deploy and test

    • Dynamic documentation of the webservice API

General Design and Technology Decisions

The source code of the implementation is structured in consideration of the shared classes between the client and server projects.

The basic projects are the client and on the server side the gateway and the backend module template, which share some common classes.The design is extendible with a new SE scheme by adding a new backend module, client plugin and their common classes. Only the port of the new endpoint backend module needs to be added in the application properties of the gateway and the Docker configuration needs to get adapted. Due to this structure all plugins at the client need to share a common interface, where each specific SE scheme plugin implements the same method signatures.

Searchitect Packages

For the design we took a structured approach analog to the model-view-controller design pattern which splits up the functionality by operational concerns resulting in following packages.

  • Model package - provides classes which are the resources of the backend module and implement the business logic of the SE scheme at the server side.
  • Controller package - contains the web service in form of RESTful controller classes which allow external access on resources.
  • View package - all representational classes used to exchange serialized information.
  • Service package - provides all classes used for the abstraction of the data access layer, like wrapper for the Java Persistence API (JPA) or RocksDB encapsulation.
  • Config package - contains the configuration class which modifies the default Spring Boot web configuration to the needs of the Searchitect framework, like attaching a specific authentication provider bean and customizing the security configuration.
  • Security package - contains the classes which extend Spring Security such as a specific authentication filter and a specific provider.
  • Util package - holds some useful services like a time provider and a URL builder.
  • Util common - used for common classes between several instances.

Used Technologies and Dependencies on Software Libraries

Spring Framework Modules:

All implementations make use of the Spring framework, the benefits of Spring are modularized lightweight components, which provide life cycle control of containers. These support inversion of control, also called dependency injection that allows to separate the configuration from use. Spring Boot has a very low overhead of configuration and comes with an embedded application server Apache Tomcat. This allows easy deployment of the stand alone tarred .jar files. Apache Maven is used for the dependency management of external libraries and the compilation. Following modules of the Spring framework have been applied: * Spring-boot-starter-parent - version 2.0.0 managed release provides dependency and plugin management for applications build with Maven. * Spring-security-web - offers classes for security of the web service such as TLS support. * Spring-security-config - is needed for the security configuration. * Spring-security-test - enables testing with applied security features. * Spring-boot-starter-web - provides libraries which support RESTful web service classes like RestTemplate and Tomcat as the default embedded application server in the container and Jackson object mapper for the serialization of objects. * Spring-boot-starter-test - contains classes to enable testing such as junit tests. * Spring-boot-starter-data-jpa - simplifies java database connectivity (JDBC) by giving support of object management and a persistence API using default Hibernate object relational mapping (ORM) configuration for the H2 database as an embedded in memory database.

Other Used Third Party Software Libraries:

Library Purpose - Version, Licence and Link
Apache Lucene Used for the indexer (Info: Version 6.1.0, Apache License, Version 2.0., Link: https://lucene.apache.org/core/)
Apache PDFBox Extracts text from PDF files (Info: Version 1.8.10, Apache License version 2.0., Link: https://pdfbox.apache.org/)
Apache POI Parse Microsoft Documents (Info: Version 3.15-beta1, Apache License version 2.0. , Link: https://poi.apache.org/)
Bouncy Castle Basic cryptographic primitives in Java (Info: Version 1.59, MIT X Consortium based license, Link: https://github.com/bcgit/bc-java)
Clusion State of the art SSE implementations (Info: Version 1.0-SNAPSHOT, GNU General Public License v3, Link: https://github.com/encryptedsystems/Clusion)
Docker Containerized deployment (Info: Version 1.13.1, Apache License version 2.0. , Link: https://github.com/docker/)
Docker-compose Define and run multi-container Docker applications (Info: Version 1.18.0, Apache License version 2.0., Link: https://github.com/docker/compose)
Google Guava Contains multimap classes (Info: Version 25.1-jre, Apache License version 2.0., Link: https://github.com/google/guava)
OpenJDK 1.8 Open source Java Runtime Environment for linux distributions (Info: Java Version 1.8, GNU General Public License, version 2, with the Classpath Exception, Link: http://openjdk.java.net/)
JJWT Json Web Token Support for the JVM (Info: Version 0.9.0, Apache License version 2.0., Link: https://github.com/jwtk/jjwt)
Junit Java framework for unit tests (Info: Version 4.12, , Link: https://junit.org/junit4/)
Maven Manage dependencies and compilation (Info:Version 3.3.9, Apache License version 2.0., Link: https://maven.apache.org/)
RocksDB Persistent key-value store (Info:Version 5.11.3, Link: https://github.com/facebook/rocksdb/tree/master/java/src/main/java/org/rocksdb/)
DDTH-commons Utility classes for RocksDB use in Java - RocksDBWrapper (Info:Release v0.8.0, MIT License, Link: https://github.com/DDTH/ddth-commons)
Springfox-swagger2 JSON API documentation for spring based applications (Info: Version 2.9.2, Apache License version 2.0. , Link: https://springfox.github.io/springfox/docs/current/)
Springfox-swagger-ui Visualize RESTful services (Info: Version 2.9.2, Apache License version 2.0. , Link: https://github.com/springfox/springfox/tree/master/springfox-swagger-ui)

Some libraries fitted the purposes of the Searchitect framework just partly and needed some adaption for the required functionality. The modified classes, which have been originally contained in following libraries, are Clusion, SCAPI, DDTH-commons, Spring Security.

Searchitect - Common

Searchitect common holds all general classes used by client, gateway and backend module * Searchitect-common-client * ClientScheme.java - holds the plugin interface definitions. * Searchitect-common-crypto * Crypto.java - wrapps encryption and decryption methods. * Searchitect-common-exception * SearchitectException.java - wraps runtime exceptions * Searchitect-common-view * RepositoryInfo.java - response of the setup method holds the name of the EDB repository * SearchResult.java - response of the search method * Upload.java - abstract class to wrap the upload index * SearchToken.java - template to pass the search token (dummy implementation) * UploadIndex.java - template to pas setup EDB (dummy implementation) * UserAccountResponse.java - response of the user creation * UserAuthRequest.java - pass credentials for authentication * UserAuthResponse.java - retrieve JSON web token from server * UserEDBResponse.java - used to retrieve EDB attributes like type of scheme and repository name * Searchitect-service * MemoryDictionary.java - implements the SearchDictionary interface and therefore provides similar methods like the RockDBAdapter, is used as a stub for testing * RocksDBAdapter.java - implements the SearchDictionary and wrapps project specific data access layer to the RocksDB key value store * RocksDbWrapperModified.java - this class was taken from Thanh Nguyen (package com.github.ddth.commons.rocksdb) and modified to support the needs of the Searchitect framework such as batched updates of label/value pairs * SearchDictionary.java - is an interface for the data access layer

Searchitect - Client

In SE most workload is done at the client, the client extracts the keyword/document identifiers pairs using an indexer and then generates the encrypted index. All schemes make use of this process therefore the indexer is part of the client same as all user account related tasks. The indexer provided by the Clusion library supports different file formats in the keyword extraction using specific libraries like Apache PDFBox for PDF documents and Apache POI for Microsoft documents but also other file formats are supported. The indexing is based on Apache Lucene Analyzer using the standard analyzer for the English language set. This analyzer performs eliminations of stop words and noise like articles and conjunctions. Due to the use of multi threading the indexing process is very efficient. In the client package the Clusion indexer is wrapped by the document parser class.

Cient packages:

  • searchitect
    • ClientRunner.java - contains the main method used to call methods of the client class and output the results
  • searchitect.client
    • Client.java - performs all requests to the web service (Searchitect Gateway) and corresponding processing of the results. For the synchronous HTTP access client side the Spring RestTemplate class is used.
    • Check if backend is up
    • Check Authentication
    • Create a new user
    • Get user account
    • Delete the user account
    • Authenticate user
    • Setup a new EDB - call to SE scheme implementation
    • Search the EDB with a single keyword - call to SE scheme implementation
    • Update the EDB - call to SE scheme implementation
    • Delete the EDB
  • DocumentParser.java - used to call the clusion indexer classes

Plugin Interface Definitions

A client object instance is related to a specific SE scheme implementation object of a plugin project. The method calls from the client class to the SE scheme implementations are generalized in the plugin interface definitions. During development and testing we figured out that the interface definition is a crucial point for scaling. As schemes are based on different data structures we needed to wrap the returned data objects. First approach was to do the serialization to the JSON transport string in the plugin class of the specific scheme. This works fine for small EDB objects but does not scale well and resulted in out of memory heap space exceptions of the JVM, because the JSON string was constructed in its full extend in memory. The second approach was to wrap the upload index by an abstract class which is extended by the specific upload index implementations and passed by reference to the client where the data is continuously streamed in JSON by the RestTemplate class. Another possible solution to enhance scalability would be to serialize the upload index into a file and manage the problem by streamed file uploads.

The following listing shows all methods which need to be supported by a client plugin:

public interface ClientScheme {

public String getImplName();
public Upload setup(String password, Multimap<String,String> invertedIndex, int numberOfDocuments ) throws SearchitectException;
public String search(String query) throws SearchitectException;
public String update(Multimap<String,String> invertedIndex) throws SearchitectException;
public SearchResult resolve(SearchResult encrypted) throws SearchitectException;

When a scheme is response revealing and does not supply a resolve algorithm the search result is immediately returned without need for decryption.

Gateway Interface Definitions of the RESTful API

The gateway uses the names of the backend {b_id} and ports to address the specific backends. The definitions of the names in the application.properties for the port mapping need to be the same as those in the docker-compose file. The interface of the webservice is provided by four controller classes in the Searchitect.controller package: * CheckController.java - used to check availability of the services. * AuthenticationController.java - manages authentication requests and issues a JWTs. * UserAccountController.java - copes with all tasks related to the user management. * RepositoryController.java - manages all encrypted search related requests to the specific backends.

RESTful Interface definitions of the Gateway. The A column indicates required authentication with +, or A if an administrator privilege is needed, or in case of that no authentication is required.

HTTP Method A Description
HTTP GET - CheckSearchitectGate returns the string “Greetings from searchitect!” to check availability of the service
HTTP GET /checkbackend/{b_id} - CheckSebackend forwards the check request to the specific backend module and returns the result
HTTP GET /checkauthbackend/{b_id} + CheckAuthenticated validates the authentication principal and forwards the check request to the specific backend module and returns the result and username
HTTP POST /auth - CreateJwt expects the user credentials wrapped in a UserAuthRequest and after successful authentication returns a UserAuthResponse containing the JWToken
HTTP GET /users A GetAllUserAccounts retrieves all user accounts from the repository if the user is authenticated as administrator
HTTP POST /user - CreateUserAccount expects credentials wrapped in a UserAuthRequest and creates a new user account
HTTP GET user/{u_id} + GetUserAccount returns the user account information if the user principal name extracted from the JWT and the {u_id} is the same
HTTP POST user/{u_id} + UpdateUserAccount expects a UserAuthRequest and validates {u_id} and user principal and updates the user account on success
HTTP DELETE user/{u_id} + DeleteUserAccountForUser validates {u_id} and user principal and deletes the user account on success
HTTP DELETE admin/{u_id} A DeleteUserAccountForAdmin validates the user principal of the administrator and deletes the user account on success
HTTP GET /backend/{b_id}/repositories A ForwardRepositories should be only enabled for administrators and returns all the repository names available at the specific backend module {b_id\ repository
HTTP POST /backend/{b_id} + ForwardSetup passes the JSON formatted UploadIndex to the specified backend module and returns the repository name wrapped by RepositoryInfo class, in the processing this name is extracted and added to the user account along with the backend module name
HTTP POST /backend/{b_id}/repository/{r_id}/search + ForwardSearch validates the access privileges for {b_id} and {r_id} of the user, on success it forwards the search request to the specific backend and responds back the results wrapped in the SearchResult
HTTP POST /backend/{b_id}/repository/{r_id} + ForwardUpdate validates the access privileges for {b_id} and {r_id} of user, on success it forwards the UpdateIndex to the specific backend and responds with HTTP Ok
HTTP DELETE /backend/{b_id}/repository/{r_id} + ForwardDelete validates the access privileges for {b_id} and {r_id} of user, on success it forwards the delete request and on success deletes the EDB entry from the user account.

Preparation for Deployment of the Gateway

In order to deploy the Gateway for testing scope, first a self-signed certificate needs to be generated by issuing these two commands in the root directory of the gateway:

keytool -genkey -alias localhost -keyalg RSA -keysize 2048 -keystore src/main/resources/tomcat.keystore
	
keytool -importkeystore -srckeystore tomcat.keystore -destkeystore tomcat.keystore -deststoretype pkcs12

Security Implementations in the Searchitect Gateway

  • Transport layer encryption** via TLS - now selfsigned snakeoil certificate stored in the Apache Tomcat keystore.
  • Username-password authentication filter for the first authentication protocol, a verifier for the password is saved at the server using the bcrypt password encoder.
  • JSON Web Tokens (JWT) are used for authentication and authorization of the client queries after the successful username-password authentication of a user.

We used the JJWT library (Json Web Token Support for the JVM) to issue and validate the JWTs. The Spring Security module needed to be extended by * a implementation of an authentication filter and provider to issue JWTs to user which sent valid credentials, * a implemention of an authorization filter to validate requests containing JWTs, * the creation of a custom implementation of UserDetailsService to support Spring Security with loading user specific data, * and an extension of the WebSecurityConfigurerAdapter class to customize the security framework according to our needs.

These Spring Security modifications are contained in the Searchitect.security and Searchitect.config packages in the gateway.

Searchitect - Backend Module

A backend module holds and manages multiple EDBs of a specific scheme. Backend module just implement the functionality of the server part of a specific scheme and are not aware of any user or access management. Though this is done by the gateway. The EDBs are managed by a service layer, which may store the EDB objects in memory or persistent on the hard drive depending on the implementation. The EDBs are the resources in the RESTful meaning and addressed by a random generated universally unique identifier (UUID) also called repository name {r_id}, which is generated during the server side setup process. Each backend controller provides the RESTful API definitions shown in the table below, all exchanged objects are formatted in JSON with the exception of the check responses, which are strings.

Backend Module Interface Definitions of the RESTful API

RESTful Interface definitions of the Backend Module. |HTTP Method Path | Description| |—-|—–| |HTTP GET / |Check returns the string “Greetings from *” with the name of the specific backend module to check availability of the service.| |HTTP GET /repositories | ListRepositories lists all repositories on the node.| HTTP POST /repository |Setup expects a specific upload index and generates a new repository and returns the repository name {r_id}.| |HTTP POST /repository/{r_id}/search|Search expects a search token and performs the search on the EDB and returns a response containing the search results.| |HTTP POST /repository/{r_id} |Update expects a specific update index and performs all updates on the EDB and on success it returns HTTP OK (HTTP 200).| |HTTP DELETE /repository/{r_id} |Deletes the specified EDB.| |HTTP DELETE /repositories | Deletes all EDBs on that node.|

A failed request results either in HTTP NOT FOUND (HTTP 404) or in case of wrong formatted JSON strings HTTP BAD REQUEST (HTTP 400). The interface definitions can also tested with command line CURL commands when the docker-compose configuration file exposes the specific port outside the container.

Build and perform unit tests of each instance manually

Run the following command at the level of each searchitect directory

mvn clean install

Deployment

Building of this software has been tested on Ubuntu 18.04 using openjdk-11.

Build Docker containers

Docker containers are built using the multistage feature. There is a builder image in the root directory and deployment images in the subprojects. A docker-compose file ties it all together.

Build using docker-compose

docker-compose build

Manual build

docker build -t searchitect_builder .
docker build searchitect-backend-dynrh2lev
...

Deploy using docker-compose

After you have built the docker containers using the commands above, you can run them on your host using

docker-compose up

or

docker-compose up -d

if you want to move the docker process to the background.

The interface description of the gateway is after deployment available at:

https://localhost:8433/swagger-ui.html

How to add a new scheme

  1. Implement your scheme in a new searchitect-common-scheme project
  2. Create a new searchitect-client-scheme-plugin project which implements the client plugin interface. This interface can be found in searchitect.common.client.ClientScheme
  3. Create a new project which implements the searchitect-backend-scheme at the server side, take a look to the other implementations the interface of the controller needs to be similar
  4. Adapt the Docker File in your searchitect.common.client.ClientScheme, take care with the port, choose one that is still available
  5. Add your new Searchitect-backend implementation in the application properties of the searchitect-gate project to the list. The name needs to be the same as in the docker-compose file (docker-compose.yml).
  6. Recompile the whole workspace and test.

Testing projects

Searchitect-Testset

Functionality:

Generates the test-sets with following parameters * -synthetic: generates several setup multimaps containing 100 keywords each mapping 10, 100, 250, 500, 750, 1000, 2000, 3000, 5000, 10000 random document identifiers. The setup multimap is serialized and stored to a file in the output path (/dstpath/) as Java object or json string. * -dynamicsynthetic: generates 20 update multimaps containing 100 keywords each mapping 10 newly inserted document identifiers and 10 multimaps containing 100 keywords each mapping 100 newly inserted document identifiers. * -batched: expects a source path and destination path as input. Indexes all files contained in the subdirectories of the sourcepath and outputs them to the destination directory. Additionally wordlists which list keywords that match the multimap are saved also to a file. The keywords are selected in a specific manner that the list will contain also keywords that have a large result set. * -dynamicbatched: expects a source path and a destination path as input. Divides the files at the source path in the subdirectories into groups of 100 files. Then the indexing processes is performed on them to generate update multimaps named by the iteration and subdirectoryname. These are stored to the destination directory. Further in each iteration a wordlist is generated containing words which match on results in the update map. The selection of the words is calculated on the per update growing multimap again in this three different groups of resultsets. Parameters: 1. kind of testset: -batched, -synthetic, -dynamicbatched, -dynamicsynthetic 2. in case of -synthetic and -dynamicsynthetic dstpath and and java or json for output 3. in case of -batched and -dynamicbatched source path of the root directory and the

Compile

mvn clean install
cd target

Usage examples

java -jar searchitect-testset-0.0.1-SNAPSHOT.jar -batched /srcpath/ /dstpath/
java -jar searchitect-testset-0.0.1-SNAPSHOT.jar -synthetic /dstpath/ java

Searchitect-Test

Preliminary

tested with ubuntu 16.04 and openjdk 10 (oracle jdk 10 causes a bc provider signing issue)

Functionality

  • Test the implementations, the tests are design to figure out the limits therefore all tests will end up with some exceptions like OutOfMemoryError: Java heap space and they are specific to the test sets generated with searchitect-testset, but may be adapted easily in case. For example hard coded source file directories are set at the beginning of the cases:
  • Synthetic: /tmp/testset/synthetic/
  • Real: /tmp/testset/synthetic/
  • ReportPath: /tmp/report/ If you have chosen a different output path for the testset just change them.

  • Test Options:

  • Synthetic case: Input :

    • first parameter: scheme [dynrh2lev|dynrh2levrocks] Output:
    • [scheme]synthetictest
  • Real case: Input:

    • first parameter: scheme [dynrh2lev|dynrh2levrocks]
    • second parameter: path to wordlist wordlistPath Output:
    • console output of testreport can be written to a file using 2>&1 | tee /tmp/reportfile.txt
    • search and setup or update report

Preliminaries

  • The docker container with searchitect server instances should be deployed
  • The serialized java objects are stored in a directory e.g. /tmp/testset/real/
  • It did make a difference in the testing but we set the Java Runtime Environment (JRE) heap space to 8 GB by issuing export _JAVA_OPTIONS= -Xmx8G, because then the docker container start to swap therefore better do not use

Packages

  • searchitect.test
    • Testrunner - Java application used to call the static methods in Dynrh2levTest depending on the command line input parameters
    • StaticTest.java - provides the static test methods for the synthetic and real test set for dynrh2lev and dynrh2levrocks
    • DynamicTest.java - provides methods for testing dynamic updates for the synthetic and the real testset

Usage

  • Compile with maven

    mvn clean install
    
  • move to the target directory

    cd target
    

Usage examples

Static synthetic tests the Setup and Search protocol
java -jar searchitect-test-0.0.1.jar -synthetic dynrh2lev /tmp/testset/synthetic/ 
java -jar searchitect-test-0.0.1.jar -synthetic dynrh2levrocks /tmp/testset/synthetic/  
Static realistic tests the Setup and Search protocol
java -Xmx2g  -jar searchitect-test-0.0.1.jar -real dynrh2lev /tmp/testset/real/  
java -Xmx2g  -jar searchitect-test-0.0.1.jar -real dynrh2levrocks /tmp/testset/real/  
Dynamic synthetic tests the Update and Search protocol
java -jar searchitect-test-0.0.1.jar -dynamicsynthetic dynrh2lev /tmp/testset/synthetic/ 
java -jar searchitect-test-0.0.1.jar -dynamicsynthetic sophos /tmp/testset/synthetic/ 
java -jar searchitect-test-0.0.1.jar -dynamicsynthetic dynrh2levrocks /tmp/testset/synthetic/  

##### Dynamic real tests the Update and Search protocol

java -jar searchitect-test-0.0.1.jar -dynamicreal dynrh2lev /tmp/testset/dynamicreal/ 
java -jar searchitect-test-0.0.1.jar -dynamicreal dynrh2levrocks /tmp/testset/dynamicreal/ 
java -jar searchitect-test-0.0.1.jar -dynamicreal sophos /tmp/testset/dynamicreal/