This framework is used to simplify the usage of searchable encryption (SE) technology and enables the integration of new SE schemes.
A SE scheme enables a server to search over an encrypted database on behalf of a client without revealing the content to the server. A SE scheme provides 3 protocols:
The server part of the framework is designed in a service-oriented architecture (SOA) as a web service. In order to address the requirements of an easy integration of new SE schemes and the state of art client/server architecture we chose a microservices design. The basic framework consists of three entities a client, a gateway and a backend template, which holds a plaintext dummy implementation. The authentication of client queries is managed by the SE gateway which maintains a small database of user account information such as credentials and metadata of the repositories used for addressing the EDBs in the backend module. The authentication of each client request is validated at the SE gateway and in case of a successful authentication forwarded to the corresponding backend module of the query. The backend module are the server side instantiations of different implemented schemes and hold the scheme specific encrypted index data structures or databases (EDBs). All interfaces in between the microservices are designed in a REpresentational State Transfer (REST) manner allowing interoperability and loosely coupled systems. Compared to remote procedure calls (RPC) or remote method invocation (RMI) which are tightly coupled. Another benefit of RESTful services is that they are easy extendible and language independent. For the serialization of the exchanged content between the instances JavaScript Object Notation (JSON) is used and the communication between client and gateway is encrypted using Transport Layer Security (TLS).
The source code of the implementation is structured in consideration of the shared classes between the client and server projects.
The basic projects are the client and on the server side the gateway and the backend module template, which share some common classes.The design is extendible with a new SE scheme by adding a new backend module, client plugin and their common classes. Only the port of the new endpoint backend module needs to be added in the application properties of the gateway and the Docker configuration needs to get adapted. Due to this structure all plugins at the client need to share a common interface, where each specific SE scheme plugin implements the same method signatures.
For the design we took a structured approach analog to the model-view-controller design pattern which splits up the functionality by operational concerns resulting in following packages.
All implementations make use of the Spring framework, the benefits of Spring are modularized lightweight components, which provide life cycle control of containers. These support inversion of control, also called dependency injection that allows to separate the configuration from use. Spring Boot has a very low overhead of configuration and comes with an embedded application server Apache Tomcat. This allows easy deployment of the stand alone tarred .jar files. Apache Maven is used for the dependency management of external libraries and the compilation. Following modules of the Spring framework have been applied:
Library | Purpose - Version, Licence and Link |
---|---|
Apache Lucene | Used for the indexer (Info: Version 6.1.0, Apache License, Version 2.0., Link: https://lucene.apache.org/core/) |
Apache PDFBox | Extracts text from PDF files (Info: Version 1.8.10, Apache License version 2.0., Link: https://pdfbox.apache.org/) |
Apache POI | Parse Microsoft Documents (Info: Version 3.15-beta1, Apache License version 2.0. , Link: https://poi.apache.org/) |
Bouncy Castle | Basic cryptographic primitives in Java (Info: Version 1.59, MIT X Consortium based license, Link: https://github.com/bcgit/bc-java) |
Clusion | State of the art SSE implementations (Info: Version 1.0-SNAPSHOT, GNU General Public License v3, Link: https://github.com/encryptedsystems/Clusion) |
Docker | Containerized deployment (Info: Version 1.13.1, Apache License version 2.0. , Link: https://github.com/docker/) |
Docker-compose | Define and run multi-container Docker applications (Info: Version 1.18.0, Apache License version 2.0., Link: https://github.com/docker/compose) |
Google Guava | Contains multimap classes (Info: Version 25.1-jre, Apache License version 2.0., Link: https://github.com/google/guava) |
OpenJDK 1.8 | Open source Java Runtime Environment for linux distributions (Info: Java Version 1.8, GNU General Public License, version 2, with the Classpath Exception, Link: http://openjdk.java.net/) |
JJWT | Json Web Token Support for the JVM (Info: Version 0.9.0, Apache License version 2.0., Link: https://github.com/jwtk/jjwt) |
Junit | Java framework for unit tests (Info: Version 4.12, , Link: https://junit.org/junit4/) |
Maven | Manage dependencies and compilation (Info:Version 3.3.9, Apache License version 2.0., Link: https://maven.apache.org/) |
RocksDB | Persistent key-value store (Info:Version 5.11.3, Link: https://github.com/facebook/rocksdb/tree/master/java/src/main/java/org/rocksdb/) |
DDTH-commons | Utility classes for RocksDB use in Java - RocksDBWrapper (Info:Release v0.8.0, MIT License, Link: https://github.com/DDTH/ddth-commons) |
Springfox-swagger2 | JSON API documentation for spring based applications (Info: Version 2.9.2, Apache License version 2.0. , Link: https://springfox.github.io/springfox/docs/current/) |
Springfox-swagger-ui | Visualize RESTful services (Info: Version 2.9.2, Apache License version 2.0. , Link: https://github.com/springfox/springfox/tree/master/springfox-swagger-ui) |
Some libraries fitted the purposes of the Searchitect framework just partly and needed some adaption for the required functionality. The modified classes, which have been originally contained in following libraries, are Clusion, SCAPI, DDTH-commons, Spring Security. |
Searchitect common holds all general classes used by client, gateway and backend module
In SE most workload is done at the client, the client extracts the keyword/document identifiers pairs using an indexer and then generates the encrypted index. All schemes make use of this process therefore the indexer is part of the client same as all user account related tasks. The indexer provided by the Clusion library supports different file formats in the keyword extraction using specific libraries like Apache PDFBox for PDF documents and Apache POI for Microsoft documents but also other file formats are supported. The indexing is based on Apache Lucene Analyzer using the standard analyzer for the English language set. This analyzer performs eliminations of stop words and noise like articles and conjunctions. Due to the use of multi threading the indexing process is very efficient. In the client package the Clusion indexer is wrapped by the document parser class.
A client object instance is related to a specific SE scheme implementation object of a plugin project. The method calls from the client class to the SE scheme implementations are generalized in the plugin interface definitions. During development and testing we figured out that the interface definition is a crucial point for scaling. As schemes are based on different data structures we needed to wrap the returned data objects. First approach was to do the serialization to the JSON transport string in the plugin class of the specific scheme. This works fine for small EDB objects but does not scale well and resulted in out of memory heap space exceptions of the JVM, because the JSON string was constructed in its full extend in memory. The second approach was to wrap the upload index by an abstract class which is extended by the specific upload index implementations and passed by reference to the client where the data is continuously streamed in JSON by the RestTemplate class. Another possible solution to enhance scalability would be to serialize the upload index into a file and manage the problem by streamed file uploads.
The following listing shows all methods which need to be supported by a client plugin:
public interface ClientScheme {
public String getImplName();
public Upload setup(String password, Multimap<String,String> invertedIndex, int numberOfDocuments ) throws SearchitectException;
public String search(String query) throws SearchitectException;
public String update(Multimap<String,String> invertedIndex) throws SearchitectException;
public SearchResult resolve(SearchResult encrypted) throws SearchitectException;
When a scheme is response revealing and does not supply a resolve algorithm the search result is immediately returned without need for decryption.
The gateway uses the names of the backend {b_id} and ports to address the specific backends. The definitions of the names in the application.properties for the port mapping need to be the same as those in the docker-compose file. The interface of the webservice is provided by four controller classes in the Searchitect.controller package:
RESTful Interface definitions of the Gateway. The A column indicates required authentication with +, or A if an administrator privilege is needed, or in case of that no authentication is required.
HTTP Method | A | Description |
---|---|---|
HTTP GET | - | CheckSearchitectGate returns the string “Greetings from searchitect!” to check availability of the service |
HTTP GET /checkbackend/{b_id} | - | CheckSebackend forwards the check request to the specific backend module and returns the result |
HTTP GET /checkauthbackend/{b_id} | + | CheckAuthenticated validates the authentication principal and forwards the check request to the specific backend module and returns the result and username |
HTTP POST /auth | - | CreateJwt expects the user credentials wrapped in a UserAuthRequest and after successful authentication returns a UserAuthResponse containing the JWToken |
HTTP GET /users | A | GetAllUserAccounts retrieves all user accounts from the repository if the user is authenticated as administrator |
HTTP POST /user | - | CreateUserAccount expects credentials wrapped in a UserAuthRequest and creates a new user account |
HTTP GET user/{u_id} | + | GetUserAccount returns the user account information if the user principal name extracted from the JWT and the {u_id} is the same |
HTTP POST user/{u_id} | + | UpdateUserAccount expects a UserAuthRequest and validates {u_id} and user principal and updates the user account on success |
HTTP DELETE user/{u_id} | + | DeleteUserAccountForUser validates {u_id} and user principal and deletes the user account on success |
HTTP DELETE admin/{u_id} | A | DeleteUserAccountForAdmin validates the user principal of the administrator and deletes the user account on success |
HTTP GET /backend/{b_id}/repositories | A | ForwardRepositories should be only enabled for administrators and returns all the repository names available at the specific backend module {b_id\ repository |
HTTP POST /backend/{b_id} | + | ForwardSetup passes the JSON formatted UploadIndex to the specified backend module and returns the repository name wrapped by RepositoryInfo class, in the processing this name is extracted and added to the user account along with the backend module name |
HTTP POST /backend/{b_id}/repository/{r_id}/search | + | ForwardSearch validates the access privileges for {b_id} and {r_id} of the user, on success it forwards the search request to the specific backend and responds back the results wrapped in the SearchResult |
HTTP POST /backend/{b_id}/repository/{r_id} | + | ForwardUpdate validates the access privileges for {b_id} and {r_id} of user, on success it forwards the UpdateIndex to the specific backend and responds with HTTP Ok |
HTTP DELETE /backend/{b_id}/repository/{r_id} | + | ForwardDelete validates the access privileges for {b_id} and {r_id} of user, on success it forwards the delete request and on success deletes the EDB entry from the user account. |
In order to deploy the Gateway for testing scope, first a self-signed certificate needs to be generated by issuing these two commands in the root directory of the gateway:
keytool -genkey -alias localhost -keyalg RSA -keysize 2048 -keystore src/main/resources/tomcat.keystore
keytool -importkeystore -srckeystore tomcat.keystore -destkeystore tomcat.keystore -deststoretype pkcs12
We used the JJWT library (Json Web Token Support for the JVM) to issue and validate the JWTs. The Spring Security module needed to be extended by
These Spring Security modifications are contained in the Searchitect.security and Searchitect.config packages in the gateway.
A backend module holds and manages multiple EDBs of a specific scheme. Backend module just implement the functionality of the server part of a specific scheme and are not aware of any user or access management. Though this is done by the gateway. The EDBs are managed by a service layer, which may store the EDB objects in memory or persistent on the hard drive depending on the implementation. The EDBs are the resources in the RESTful meaning and addressed by a random generated universally unique identifier (UUID) also called repository name {r_id}, which is generated during the server side setup process. Each backend controller provides the RESTful API definitions shown in the table below, all exchanged objects are formatted in JSON with the exception of the check responses, which are strings.
RESTful Interface definitions of the Backend Module. |HTTP Method Path | Description| |—-|—–| |HTTP GET / |Check returns the string “Greetings from *” with the name of the specific backend module to check availability of the service.| |HTTP GET /repositories | ListRepositories lists all repositories on the node.| HTTP POST /repository |Setup expects a specific upload index and generates a new repository and returns the repository name {r_id}.| |HTTP POST /repository/{r_id}/search|Search expects a search token and performs the search on the EDB and returns a response containing the search results.| |HTTP POST /repository/{r_id} |Update expects a specific update index and performs all updates on the EDB and on success it returns HTTP OK (HTTP 200).| |HTTP DELETE /repository/{r_id} |Deletes the specified EDB.| |HTTP DELETE /repositories | Deletes all EDBs on that node.|
A failed request results either in HTTP NOT FOUND (HTTP 404) or in case of wrong formatted JSON strings HTTP BAD REQUEST (HTTP 400). The interface definitions can also tested with command line CURL commands when the docker-compose configuration file exposes the specific port outside the container.
Run the following command at the level of each searchitect directory
mvn clean install
Building of this software has been tested on Ubuntu 18.04 using openjdk-11.
Docker containers are built using the multistage feature. There is a builder image in the root directory and deployment images in the subprojects. A docker-compose file ties it all together.
docker-compose build
docker build -t searchitect_builder .
docker build searchitect-backend-dynrh2lev
...
After you have built the docker containers using the commands above, you can run them on your host using
docker-compose up
or
docker-compose up -d
if you want to move the docker process to the background.
https://localhost:8433/swagger-ui.html
Generates the test-sets with following parameters
mvn clean install
cd target
java -jar searchitect-testset-0.0.1-SNAPSHOT.jar -batched /srcpath/ /dstpath/
java -jar searchitect-testset-0.0.1-SNAPSHOT.jar -synthetic /dstpath/ java
tested with ubuntu 16.04 and openjdk 10 (oracle jdk 10 causes a bc provider signing issue)
Test the implementations, the tests are design to figure out the limits therefore all tests will end up with some exceptions like OutOfMemoryError: Java heap space and they are specific to the test sets generated with searchitect-testset, but may be adapted easily in case. For example hard coded source file directories are set at the beginning of the cases:
Synthetic: /tmp/testset/synthetic/
Real: /tmp/testset/synthetic/
ReportPath: /tmp/report/ If you have chosen a different output path for the testset just change them.
Test Options:
Compile with maven
mvn clean install
move to the target directory
cd target
java -jar searchitect-test-0.0.1.jar -synthetic dynrh2lev /tmp/testset/synthetic/
java -jar searchitect-test-0.0.1.jar -synthetic dynrh2levrocks /tmp/testset/synthetic/
java -Xmx2g -jar searchitect-test-0.0.1.jar -real dynrh2lev /tmp/testset/real/
java -Xmx2g -jar searchitect-test-0.0.1.jar -real dynrh2levrocks /tmp/testset/real/
java -jar searchitect-test-0.0.1.jar -dynamicsynthetic dynrh2lev /tmp/testset/synthetic/
java -jar searchitect-test-0.0.1.jar -dynamicsynthetic sophos /tmp/testset/synthetic/
java -jar searchitect-test-0.0.1.jar -dynamicsynthetic dynrh2levrocks /tmp/testset/synthetic/
Dynamic real tests the Update and Search protocol
java -jar searchitect-test-0.0.1.jar -dynamicreal dynrh2lev /tmp/testset/dynamicreal/
java -jar searchitect-test-0.0.1.jar -dynamicreal dynrh2levrocks /tmp/testset/dynamicreal/
java -jar searchitect-test-0.0.1.jar -dynamicreal sophos /tmp/testset/dynamicreal/