My ZKP Experiment

This week I had two (unrelated) meetings with people who work with zk-SNARKs, first time I talked to people who actually work with zero-knowledge proofs (zkp) as part of their jobs! I had mixed feelings after the meetings. On the one hand it was so exciting to talk to people who work with zkp in real life, so interesting to hear about their applications; on the other it is a bit intimidating just how much is there to learn in this field but not everything is useful. New frameworks, languages and zkVM popped up within the last five or six years, created mainly to address two issues: (a) time-consuming computation and (b) user un-friendly complex proof-logic. The underlying maths and cryptography used for zero-knowledge are pretty stable. The problem is, with lanugages abstracted further and further away from the proof logic and with the priorities on speed, the cost shifted to security and privacy protection properties. This article gives a very high level but direct comparison of existing zkp lanuages/zkVM based on their “zk’ness”, which could be useful if you don’t know where to start and if privacy is important in your use case.

My first prototype using zkp to tackle carbon emissions claims was written in Circom. The prototype was built for a use case in which a customer of a cloud provider wants to know the carbon emissions based on their usage. The customer’s business run on servers hosted by their cloud providers and they want to know their Scope 3 emissions. Existing systems and methodology for carbon emissions reporting rely on customers either trusting the data from their providers unconditionally or recruiting third party independent auditors to verify the data. With zkp, customers can automate the verification as frequently as needed. The providers do not need to reveal confidential input that goes into the emissions accounting, for example they might not want to reveal their business volume by giving away their total power consumption at any one of their data centres, nor would they want to reveal data related to their electricity suppliers.

There is one tricky bit in this prototype - how can we ensure that the customer share of the power consumption is accurate? We could apply the “Completeness Principle” of the GreenHouse Gas Protocol, where all sources of emissions have to be accounted for. So we can assume that the divided power consumption must add up to 100% of the total power consumption. Therefore we could make it a requirements that providers also need to provide a transparency log with encrypted customer data, then we can use homomorphic cryptography to prove that all customer shares in percent add up to a 100. Moreover, if the data on the log is arranged in a Merkle Tree customers can also verify that they are indeed part of this customer base. This is not bulletproof unfortunately, providers can still cheat by adding fake customers to the log. I will provide more information in future blogs about this problem.

Now back to the prove that all customer shares add up to 100, I can use Paillier cyptosystem[1] for this. Given that each customer share is encrypted using Paillier, we can then do homomorphic addtion to prove that they add up to 100 without knowing each individual share, hence protecting the private data. This can be done outside of zkSNARK, but we still need to check that the encrypted share used in the carbon emission calculation is the right one!

To achieve that I added a circuit that can do Paillier encryption on the (private) customer share. In this circuit the encrypted customer share used in the transparency log is checked against the customer share encrypted in the circuit. As it turns out, this encryption is pretty computationally expensive! Paillier’s modulus for the key is made of the square of the product of two prime numbers, and to achieve high security property (at least 128bit security strength) we need the modulus size to be 3072bits. It is a big number and therefore needs to be divided into field elements for the arithmetic operations. The bigger the modulus, the higher the number of constraints generated by the circuit. My laptop cannot complete a run with modulus size bigger than 200bits.

I did some benchmarking and plotted the results to find out the relationship between the various field elements sizes and the number of constraints generated. The results show that with the same key size, the more number of bits packed into each field element the fewer constraints are generated:

Number of constraints generated by various key sizes, broken down in to field elements of various bit sizes

So how do we improve on this? What is the max number of bits can be packed in a single field element? Tune in next blog post!

[1] Paillier, P., 1999, April. Public-key cryptosystems based on composite degree residuosity classes. In International conference on the theory and applications of cryptographic techniques (pp. 223-238). Berlin, Heidelberg: Springer Berlin Heidelberg.