End-to-End Encrypted Kafka with Proxy Re-Encryption

NuCypher is excited to announce the open source availability of NuCypher Kafka, a cryptosystem for granular, end-to-end encrypted message…

NuCypher is excited to announce the open source availability of NuCypher Kafka, a cryptosystem for granular, end-to-end encrypted message queues and streams. This release brings enterprise-grade security and PCI and HIPAA-compliance to the Kafka ecosystem, a key requirement for financial services, healthcare, and industrial IoT use cases.

The NuCypher cryptosystem relies on proxy re-encryption (PRE), a type of public-key encryption that allows a proxy entity to transform ciphertexts from one public key to another, without learning anything about the underlying message. PRE can be thought of as an improved, scalable PKI or multi-party TLS. Whereas traditional PKI is designed for 1-to-1 encrypted communication, PRE delivers similar security and performance guarantees for N-to-N encrypted communication. In our open source version, we use ElGamal on elliptic prime curves for encrypting and decrypting messages, and a variant of BBS98 for the re-encryption. In our commercial offering we have proxy re-encryption for ECIES.

The basic architecture of NuCypher Kafka is shown below. For simplicity, we assume one broker and one channel, although it’s trivial to expand to multiple brokers and channels.

Public/private key pairs denoted here are the following:

  • priva/puba .. privc/pubc: key pairs under which producers encrypt the data;
  • priv1/pub1 .. priv3/pub3: key pairs of consumers. Their own private keys can decrypt data they’re receiving;
  • Administrator knows all private keys of producers and private keys of consumers (or just public key of consumers, depending on the variant of proxy re-encryption used). Or producers can be their own administrators without letting anyone else know their private keys.

When a producer a connects to the broker, it generates a random AES key per session (DEKa). It includes an encrypted version of it, EDEKa = enc(puba, DEKa), as a part of every message. The content of the message is encrypted with DEKa while the topic is public.

On the broker side, there are re-encryption keys kproducer-consumer to transform data from key producer to key consumer. The system administrator responsible for granting permissions knows all the private keys and generates the re-encryption keys. This system administrator is not a part of the broker side infrastructure.

Each consumer has an individual public/private key pair, let’s take priv1/pub1 as an example. The broker layer holds a re-encryption key ka1, so it transforms EDEKa -> EDEK1 to be readable by consumer 1 if EDEKa wasn’t yet re-encrypted for 1. If that EDEK was already re-encrypted for consumer 1, the cached version is used.

After that, the consumer can decrypt EDEK1 with his own key priv1. For performance, cached DEKa can be used if it was already decrypted for this EDEK1. The bulk of the message data, encrypted with DEKa, can be decrypted with DEKa.

There can be as many consumers per encrypted topic as needed, and for each consumer a new re-encryption key is created.

We leave out the details of message/producer authentication because these questions are solved elsewhere. But it’s worth noting that if producers produce message signatures, the broker can convert them to “per channel” signatures if desired.

It is possible to grant access to parts of a message (different fields in avro or json messages). We’ll cover the technical details in our next blog posts.

To get started, head over to the GitHub repo or join the conversation on Slack.