IBM just released an open source software package called HELib.
The HE stands for homomorphic encryption.
Although it doesn’t sound terribly sexy or impressive, HELib is actually an interesting and important milestone in cryptography.
HE is also a surprisingly relevant topic right now, with our ever-increasing attraction to cloud computing.
Bear with me, and I’ll try to explain.
Imagine that I am your cloud provider, and I keep databases online for you.
Imagine also that I am a security-conscious vendor, so I keep all your data encrypted, both when I serve it up to you, and when I save it to disk.
That’s about as good as it gets these days from a cloud security perspective.
→ It doesn’t matter whether I’m a pure-play over-the-internet cloud provider, or just the manager of the server farm team in your own IT department. The situations are similar, though they may differ in degree: I’ve got your data, and you have no alternative but to trust me to do the right thing with it.
Now imagine that you want me to search through your data, for example to see how many ACME-WIDGETS were bought by customers called DUCKLIN in the last year.
Traditionally, the process would go something like this:
- You encrypt the search terms and upload them to me.
- I decrypt the search terms so I know what to look for.
- I decrypt your data (perhaps only record by record, not all at once – that’s a detail that doesn’t matter here) so I have somewhere to search.
- I perform the search using the decrypted data.
- I encrypt the search results, if there are any, and return them to you.
Additionally, you hope that:
- I get rid of all remnants, on disk and in memory, of both the search terms and the decrypted data once the search is complete.
- I don’t take advantage of you, since I’m decrypting your data for this search, to sneak in other searches at the same time, whether for my own benefit, or for my government, or for one of your competitors.
There’s a lot that could go wrong – for you, at any rate.
The homomorphic difference
Imagine, however, if I could simply take your encrypted search terms, leave them encrypted, search for them directly in the still-encrypted database, and get the same results.
If I can perform calulations directly on your encrypted data, yet get the same results that you get from the unencrypted data, we both win enormously from a security and privacy point of view.
You don’t need to give me any decryption keys at all, so you no longer have to trust me not to lose, steal or sell your data. (You still have to trust me to tell you the truth about any results I work out for you, but that is a completely different issue.)
And I no longer need your decryption keys, so I can’t lose or abuse your data even if I wanted to.
That’s the promise of homomorphic encryption, which we mentioned at the start.
Until 2009, no-one was sure whether homomorphic encryption was even possible.
Then, a Stanford student and IBM researcher called Craig Gentry showed that it could be done, in a PhD thesis entitled simply, “A fully homomorphic encryption scheme.”
We weren’t home and dry yet, though.
Gentry had what amounted to an existence proof, showing that homomorphic encryption could no longer be considered impossible, but he didn’t have a practicable real-world implementation of the concept.
At the time, back in July 2009, well-known cryptographic personality Bruce Schneier praised Gentry’s efforts, but pointed out that:
Gentry's scheme is completely impractical... Gentry estimates that performing a Google search with encrypted keywords - a perfectly reasonable simple application of this algorithm - would increase the amount of computing time by about a trillion.
Well, we’re now a few steps further forwards, with IBM’s release of the abovementioned software package HELib:
HElib is a software library that implements homomorphic encryption (HE). Currently available is an implementation of the Brakerski-Gentry-Vaikuntanathan (BGV) scheme, along with many optimizations to make homomorphic evaluation runs faster, focusing mostly on effective use of the Smart-Vercauteren ciphertext packing techniques and the Gentry-Halevi-Smart optimizations.
With a package description like that, it’s obvious that consumer-facing homomorphic encryption tools aren’t going to be a click away at the Play Store or the App Store for a while yet.
On the other hand, four years ago we didn’t even know whether it would be possible to have homomorphic encryption at all.
So, watch this space!
Exciting news! Not to be too nitpicky, but homomorphic encryption has been around for longer than Gentry. Just not supporting both addition AND multiplication. Gentry just proved that FULLY homomorphic encryption exists which allows both operations and thus the evaluation of any circuit.
The non-full homomorphic encryption is used, for example, in private information retrieval where additive operations and scalar multiplication are sufficient to privately retrieve a database element.
True. In the RSA public key cryptosystem, for example, multiplying the encryption of two inputs is the same as encrypting the multiplication of two inputs.
I decided not to go into partial homomorphic encryption in the article to avoid getting bogged down, but you are right to say that others got close (and usefully so) before Gentry's paper.
The Wikipedia article to which I link above (link now fixed!) lists a number of partially homomorphic cryptosystems for those who would like to trace the subject's history a bit further…
Yikes, that Wikipedia link looks broken!
Fixed it. Thanks. (I left out the 'http://' in the HREF so it turned into a relative link 🙂
no biggie- an unusable link: That's the promise of **homomorphic encryption**, which we mentioned at the start.
Until 2009, no-one was sure whether homomorphic encryption was even possible.
Fixed it. Thanks.
This is interesting academic research, but I'm sceptical that it'll be practicable in the foreseeable future, unless in some highly specialised niche application. For comparison, take the example of public key encryption: it's so much more expensive than symmetrical encryption that no one ever normally uses it to encrypt a message or file as such, but only a symmetrical encryption key under which the heavy lifting of encryption is done. This hasn't changed significantly since public key cryptography came into widespread use as unfortunately, optimisation of algorithms doesn't follow Moore's Law. Even if it did, we'd have to wait for around 60 years for a homomorphic Google search by my reckoning!
Interesting idea!
One question:
In your example, would the results also still be encrypted, or would the cloud provider somehow obtain plain text results and re-encrypt them to feed back to the customer?
There would be no plaintext visible to the cloud provider.
Please explain (for crypto newbs) how this is not as simple as Rot13, then, such that the encrypted text sent back and forth over the network could only be decrypted to become the original text.
I indeed wouldn't want my cloud provider to have access to my keys. I also would want that the only place where my data can be found in plain is in the memory of a computer when an authorized user is working with it. Not on servers, in transit or on local hard disks. But if my data can be searched while still encrypted, but I have no control over what happens to the results, it is still bad, imho.
The upside of encrypted searching is that the cloud provider knows only something along these lines:
"The word EXCJZF appeared 8 times in the text RERJVERNG… [continue until tired] …IVVDEAGB."
Searching like this does leak a bit of information, especially from traffic analysis (for example, how often your searches succeed instead of failing, or how your pattern of searching is affected by world events), but since EXCJZF is meaningless, so too, for the most part, is the outcome of the search.
Note that this is good for the cloud provider, too. Assuming he is on the level, he can't wilfully be accused of abusing his position as your provider, or of being in league with (insert a popularly-blamed co-conspirator of your choice) to use his access to your decrypted data for insider purposes.
Hmm, neat thing is that you don't have to do like with password hashes. I'd be curious as to if this can be exploited though, to make it weaker.
Doesn't it mean that a given plaintext encrypted under a given key will always give the same ciphertext? I thought that was a big no-no in cryptography, and the reason why block ciphers are always used in CBC mode or something similar.
My thoughts exactly. If your adversary can mount a chosen plaintext attack then they would be able to break the whole system.
In Paul Ducklin's example if the cloud provider can access your web site, they could do a search for a term of interest, and then see the encrypted version go into the database. They could then use that same key on other tables in the database that the original web search did not give them access to.
If I understand correctly, the search term sent to the cloud provider is itself encrypted. So you send the provider a search term of “HASDF()!@NV”, which they perform and send you back the encrypted result. They never see anything plaintext and, in fact, a plaintext search would return no results (unless by some chance the encrypted data somehow spelt out the term being searched.)
I believe the encryption algorithm adds noise so that a given plaintext will not always give the same ciphertext, making chosen plaintext attacks less of a threat.
Amazing work!
I believe i'll have to try it out first.