-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support private processing by keeping data encrypted for rented servers #707
Comments
Hey - we did some work on homomorphic encryption for private search here: https://blog.exolabs.net/day-8 |
Got it. Thanks for the reply! Just to confirm, does each machine in the cluster know what prompt it’s processing and what the final output will be? I’m wondering the possibility of hosting our machines in different data centers (for example, one on AWS, one on DigitalOcean, and one on Linode) while ensuring that the data remains private. The idea is that privacy would be maintained unless someone gains access to all the servers simultaneously, rather than just one. Would the overhead issue still persist in this scenario as well? |
Yeah, currently each node knows the prompt. We propagate the prompt to each node for debugging / to show the user the prompt on each machine. We could of course remove that. At the very least each node needs the intermediary embeddings, which contain information about the prompt. Each node knowing just this would be a weak form of privacy. I don't know of any low-overhead approach for collaborative inference with privacy between nodes. This would be an interesting research direction to explore. |
Just messaged you on Discord. |
Hey, not sure if this is a thing already or not but I was wondering if there could be the possibility of adding some sort of "homomorphic encryption" so that whatever that's being sent to the cluster (and is being processed) is completely encrypted, the client then decrypts the results.
The idea is to have your prompts to be sent to a rented remote cluster (a bunch of rented servers), when they receive the prompt in encrypted format, they processes the prompt (encrypted) and spit out the results (also encrypted), and when the client receives the results, it decrypts it.
You might consider implementing it similarly to Tor's onion routing, where data can be decrypted only if all nodes are accessed. This means that unless someone has control over every machine, the data remains secure.
Btw as an open-source project maintainer, I wanted to thank you and all the devs supporting this project!
The text was updated successfully, but these errors were encountered: