Recursive Curse

The prototype I am working on at the moment is related to the first cloud computing use case mentioned in my previous post Zero trust, always verify. The prototype consists of five actors:

Data centre operator
Data centre customer
Electricity supplier
Smart meter manufacturer
Trusted certificate authority

In this use case we assume that for regulatory or business reputation purposes, a data centre customer wants to publish their carbon emissions data that includes all three scopes of emissions. Therefore they need to know the carbon emissions figures from their cloud providers, and they want to be able to verify the figures. The data centre operator, therefore, acts as the prover in this scenario, as they have all the data to produce the carbon emissions report, but they don’t want to reveal all the related business-sensitive information in the process of doing so.

I have written a circuit that can generate a proof using the private data input by the data centre operator. The proof can be serialised and sent to their customers, who can run the verification in a separate process using public data and the proof. This proof actually consists of multiple sub-proofs, because not only do they want to provide a proof that a customer’s emissions were calculated correctly based on their usage, but they also need to prove that the smart meter reading and the customers’ share can all be trusted too. So I have also written a circuit to verify all the signatures in the smart meter and carbon intensity chains, and another circuit that verifies that all customer shares add up to 100% of the total carbon emissions.

The challenge I am facing at the moment is scalability.

Take the “customer shares add to up 100%” scenario for example. It is possible that a data centre can have over a million customers. In my prototype, I assume that customer records (each contains a customer ID and their share of the total emissions) are encrypted and put on a Merkle tree by the prover (i.e. data centre operator). The initial idea to generate a proof for the root of the tree is first to generate a proof for each leaf, and then recursively generate a proof for each node at each level until the root. The final proof, the root proof, should have a public output value of 100%. Running the circuits on my laptop it takes ~10-14s for the base proofs (i.e. for each leaf), and a few seconds more for each recursive proof. Let’s say 12s for a base proof and 16s for a recursive proof.

For a million customers it would take ~12,582,912s, i.e. close to 146 days, to finish all the base proofs. The Merkle tree has 21 levels in total, so the number of nodes above the leafs would be (2^20)-1 = 1,048,575, and it would take ~16,777,200s (~194 days) to do all the recursive proofs. So in total, it would take almost a year running non-stop to complete all the proof generation!

I am now trying a different approach. In theory, each customer only needs one proof, the root, to do the verification. Therefore, instead of using recursive proofs to produce the final sum, I could in theory build the Merkle tree using the sums along with the hashes. I tested the building of such a tree and it took only 1 hour and 35 minutes for a million customer records. The trick now is to generate a cryptographically provable witness for proof. The o1js framework I am using has example code that I can base on, I haven’t got it fully working yet but it’s looking very promising! Perhaps in a future blog I could write up the cryptographic properties of the circuits I have built.

The scalability challenge for the carbon intensity, meter readings, and signature chains is another story for another day, but it’s equally interesting!