Skip to content

Files

Latest commit

60cc2e2 · Nov 4, 2017

History

History
85 lines (68 loc) · 10.2 KB

proposal.md

File metadata and controls

85 lines (68 loc) · 10.2 KB

Proposal

1. Who are the members of your team?

Sergey Frolov, Albert Dayn

2. What basic problem will your project try to solve?

We want to enable the use of cloud computing while maintaining the security and privacy of the data being worked on.

3. Define the problem that you will solve as concretely as possible. Provide a scope of expected and potential results. Give a few example programs that exhibit the problem that you are trying to solve.

Computations that work with, and produce, sensitive data must be kept private from potentially malicious observers. Certain problems could be solved by computationally-intensive cryptographic algorithms, such as PIR (Private Information Recovery), that become infeasible at scale, while some problems could not be solved at all without hardware support. We hope to utilize Intel SGX (Software Guard Extension) to introduce the ability to offload private tasks to a server, without any possibility of the server learning anything about either the function being computed, the inputs, or the outputs, and we want to expose this solution natively in Python with our compiler.

Ideally, we hope to also provide further guarantees about the secrecy of the computation, such as ensuring the operation will be constant time, and potentially protecting against inconsistent memory access patterns which are both side channels that can be used to learn about the computation being performed, but these are much less likely to be fully implemented in the given time frame.

An application of this compiler would be to enable shared computation between mutually distrusting parties. A real-world example of this issue was a problem the Signal secure messaging app had when they wanted to allow users to discover which of their contacts were using Signal without exposing this contact list to the Signal server [1]. This problem required the ability to compute the intersection between an input set of contacts private to the user and another set private to the server, where the output was safe to share only with the user. In addition, such a compiler could be used for private computation on an untrusted server, such as running a computationally expensive Machine Learning model remotely without leaking confidential information about the weights being learned.

[1] https://signal.org/blog/private-contact-discovery/

4. What is the general approach that you intend to use to solve the problem?

We will create a compiler that takes in code with a mix of locally and remotely executing functions. It will locally compile the functions which a user wants to be executed remotely to a bytecode representation. A service provider will be created that receives and interprets the bytecode secretly, after which it sends back the result. This will require the ability for a compiled application to ensure they are communicating with a trusted server-side program and not being intercepted or changed by a potentially malicious server/OS. For this, we will use Intel’s SGX (Software Guard Extensions). SGX allows a CPU (in this case, the server CPU) to run with its stack and heap encrypted in RAM and only decrypted internally to the CPU in its cache and registers. This is done by running the code in what Intel calls an “enclave”. After getting all remote computation and compilation tasks accomplished, we want to add a static analysis phase to the generated bytecode so users can request that the compiler check if the bytecode can be proven to run in constant time. This will use a formalized set of inference rules (defined on either our flattened AST Python representation or the IR of our bytecode) as an optional extra security measure. We may also explore the feasibility of enforcing memory safety and/or constant runtime as one of the goals of the bytecode design itself.

5. Why do you think that approach will solve the problem? What resources (papers, book chapters, etc.) do you plan to base your solution on? Is there one in particular that you plan to follow? What about your solution will be similar? What will be different?

This approach has been explored before, mainly by a team which created a project called “Haven” [2]. Haven’s goal was to allow execution of full programs not created for SGX in an SGX enclave, by loading them in the secure execution environment and handling errors as they arose during execution. These errors would be caused by x86 instructions that Intel has disallowed in the enclave due to some potential to leak data. Our goal is similar in that we want to run programs in a secure environment not directly compiled with Intel's SDK as an enclave, but we differ in 2 main ways. First, we want to create these programs to run in the enclave so they can never cause the errors Haven needs to deal with. Second, we wish to use these programs as remote computations on data, instead of as a simple sandbox. This means that the remote connection features must be built directly into a compiled language (in our case, Python).

“Ryoan” [3] is a newer project with goals similar to what we wish to accomplish. Ryoan’s aim is a secure distributed computing network, while we are focusing on a single server remote computation as our initial goal. This difference allows us to more aggressively optimize for execution time in the remote environment as we do not need to worry about multi user synchronization. Specifically, we can design the bytecode as a subset of traditional language features which deal solely with computation. Whereas we will intend to use bytecode which has been generated to enforce runtime guarantees, Ryoan runs x86, like Haven, which must first be verified as safe using Google’s NaCL technology to protect against potential memory access violations and other security holes. This causes performance issues and potential headaches for x86 code that doesn’t actually cause issues, but can’t be proven to pass verification.

In summary, if we fully and successfully implement our research goals, our main contributions over the other attempts, which come from the fact that we are implementing secure remote code execution at the compiler level, are the ability to ensure that programs which we compile will not be slowed down due to overhead from verification of integrity at runtime, and will not crash due to no verification, along with the exposure of this security feature as a built in feature of our py-sgx language.

We hope to add onto these two contributions by extending our compiler with the ability to prove computations as constant time following the work of a paper titled “Verifying Constant-Time Implementations” [4], which formalizes a theory on proving a computation will be constant, but we leave this as a desire at this point rather than a goal due to the current complexity of our project.

[2] https://tc.gtisc.gatech.edu/bss/2014/r/haven.pdf
[3] https://www.cs.utexas.edu/users/witchel/pubs/hunt16osdi-ryoan.pdf
[4] http://haslab.uminho.pt/jba/files/16usenix.pdf

6. How do you plan to demonstrate your idea? Will you use your course compiler. If so, what specific changes do you plan to make to it (e.g., what passes need to be changed or added)?

This approach will require us to build 2 main systems:

(1) A modified version of our current compiler that will take python code, with some functions marked by the programmer in some way (likely with Python’s “@” annotations) as remote executors (functions meant to be executed remotely). It will compile the locally running segments of the program to executable x86 asm linked with a runtime library that has the ability to ensure it is talking to the server through an encrypted connection. It will compile the remote executors to an intermediate bytecode that we will have to define. This bytecode will be sent to the server. Ideally, bytecode compilation will be similar to the process of compiling x86 assembly. By using bytecode we enable extended security by not leaking errors such as null pointer exceptions and divide by 0 errors. The design of this bytecode will be tailored to constant time interpretation and security from exception raising errors.

(2) A server that uses SGX and Intel’s remote attestation products. SGX will allow the server to compute data without seeing the input or result. Remote Attestation provides the ability for the client program to ensure it is talking to an SGX secured, untampered version of our server program. The server will house the bytecode interpreter, which will run entirely in an Intel Secure Enclave to prevent secrets from leaking to a potentially malicious operating system.

If these can both be completed in a reasonable amount of time, we will begin work on a third system, a new pass during compilation which could potentially go after flattening or after instruction selection. This pass will have the purpose of ensuring that all functions marked as constant time can be proven to be constant. We will create a set of inference rules based on prior work on this problem which can be used to constructively prove if a set of statements will be constant time. We will explore formalizing other static analyses afterwards.

7. How will you evaluate your idea? What will be the measurement for success?

Our general goal is to provide a safe and secure way to perform computations remotely. We have laid out the following plan of attack for this problem, as a measurement of how far we get in accomplishing this task: First, we will focus on compiling remote executors to a statically linked binary and running on a remote server in SGX. At this point, no intermediate bytecode or interpreter will be needed. Next we will focus on designing a bytecode to facilitate verification and, if possible, provide security properties such as memory safety and constant runtime. When this is designed, it will have to be added as a potential target language of our compiler along with x86, meaning that we will add the ability to cross compile. At the same time we will be designing a parser and interpreter for this bytecode, which will run in an SGX enclave. If time permits we hope to explore the provability of security features for the remote executors at compile time. We will measure our success based on how much we can prevent the remote environment from learning about the information being computed.