The most ingenious ZK application: A look back at the principles and business logic of Tornado Cash

2023-09-08 17:39:36

Collection

The privacy project represented by Tornado truly utilizes the zero-knowledge property of the ZK-SNARK algorithm, while most Rollups claiming to use ZK only leverage the simplicity of ZK-SNARK.

Written by: Faust, Geek Web3

Introduction: Recently, Vitalik and several scholars jointly published a new paper, which mentioned how Tornado Cash implements anti-money laundering schemes (essentially requiring the withdrawer to prove that their deposit record belongs to a set that does not contain dirty money). However, the paper lacks a detailed interpretation of the business logic and principles of Tornado Cash, leaving readers somewhat confused.

It is also worth mentioning that privacy projects represented by Tornado truly utilize the zero-knowledge nature of the ZK-SNARK algorithm, while most Rollups that claim to use ZK only leverage the simplicity of ZK-SNARK. Many times, people often confuse the difference between Validity Proof and ZK, and Tornado happens to be an excellent case for understanding ZK applications.

The author of this article happened to write an article about the principles of Tornado in Web3 Caff Research in 2022. Today, some paragraphs from that article will be excerpted and expanded upon to help everyone systematically understand Tornado Cash.

The Principle of "Tornado"

Tornado Cash is a mixer protocol that utilizes zero-knowledge proofs. The old version was launched in 2019, and the new version started its beta in late 2021. The old version of Tornado basically achieved decentralization, with the on-chain contracts being open-source and without multi-signature control, and the front-end code being open-source and backed up on the IPFS network. Due to the simpler and more understandable overall structure of the old version, this article will focus on interpreting the old version.

The main idea of Tornado is: to mix a large number of deposit and withdrawal actions together. After depositors deposit tokens into Tornado, they present a ZK Proof to prove that they have deposited, and then withdraw using a new address, thereby severing the association between the deposit and withdrawal addresses.

More specifically, Tornado is like a glass box filled with coins deposited by many people. We can see who deposited the coins, but these coins are highly homogeneous. If a stranger takes a coin from the glass box, it is very difficult to know who originally deposited that coin.

(Image source: rareskills)

This scenario seems quite common: when we swap a few ETH from a Uniswap pool, we cannot know who provided the ETH that was taken out because there are too many people who have provided liquidity to Uniswap. However, the difference is that each time we take tokens from Uniswap, we need to use other tokens as equivalent costs, and we cannot transfer funds "privately" to others; whereas a mixer only requires the withdrawer to present proof of deposit.

To make the deposit and withdrawal actions appear homogeneous, the deposit addresses in the Tornado pool keep the amounts deposited consistent each time, and the withdrawal addresses keep the amounts withdrawn consistent each time. For example, in a pool with 100 depositors and 100 withdrawers, although they are publicly visible, they appear to have no connection, and the amounts deposited and withdrawn by each person are the same. This way, it can confuse the observer, making it impossible to determine the association based on the deposit and withdrawal amounts, thus severing the trace of fund transfers, which obviously provides a natural convenience for money laundering.

But there is a key question: how does the withdrawer prove that they have deposited? The address initiating the withdrawal to the mixer is not associated with any of the deposit addresses, so how can we determine their eligibility to withdraw? The most direct method seems to be for the withdrawer to disclose which deposit record belongs to them, but this would directly reveal their identity. At this point, zero-knowledge proofs come into play.

The withdrawer provides a ZK Proof to prove that they have a deposit record in the Tornado contract and that this deposit has not yet been withdrawn, allowing them to successfully initiate a withdrawal. Zero-knowledge proofs inherently achieve privacy protection; the outside world only knows that the withdrawer has indeed deposited into the fund pool, but does not know which depositor they correspond to.

To prove "I have deposited in the Tornado fund pool," it can be transformed into "my deposit record can be found in the Tornado contract." If we denote the deposit record as Cn, the problem can be summarized as:

Given that the set of Tornado deposit records is {C1, C2, … C100…}, the withdrawer Bob proves that he has generated a certain Cn in the deposit records using his key, but does not disclose which Cn it is through ZK.

This requires the special property of Merkle Proof. Because all deposit records of Tornado are stored in a Merkle Tree constructed on-chain, as its lowest-level leaf nodes, and the total number of leaves is about 2 to the power of 20, which is greater than 1 million, most of which are in a blank state (assigned an initial value). Whenever a new deposit occurs, the contract writes the corresponding characteristic value Commitment into a leaf and then updates the root of the Merkle Tree.

For example, if Bob's deposit operation is the 10,000th in Tornado's history, then a characteristic value Cn associated with this deposit will be written into the 10,000th leaf node of the Merkle Tree, i.e., C10000 = Cn. The contract will then automatically calculate the new root and update it. (ps: To save computation, the Tornado contract will cache the data of previously changed nodes, such as Fs1 and Fs2, Fs0 in the figure below)

(Image source: RareSkills)

And Merkle Proof itself is very concise and lightweight; it utilizes the simplicity of tree data structures in the retrieval/tracing process. To prove that a transaction TD exists in the Merkle Tree, it is sufficient to provide the Merkle Proof corresponding to the root (as shown in the right part of the figure below), which is quite concise. If the Merkle Tree is particularly large, with 2 to the power of 20 leaves, containing 1 million deposit records, the Merkle Proof only needs to include the values of 21 nodes, which is very short.

If we want to prove that a transaction H3 is indeed included in the Merkle Tree, we need to prove that using H3 and other parts of the Merkle Tree data, we can generate the root, and the data needed to generate the root (including Td) constitutes the Merkle Proof.

When Bob withdraws, he needs to prove that the certificate he possesses corresponds to a recorded deposit hash Cn in the Merkle Tree. In other words, he needs to prove two things:

Cn exists in the Merkle Tree of the Tornado contract, which can specifically construct a Merkle Proof that includes Cn;
Cn is associated with the deposit certificate in Bob's possession.

Detailed Explanation of Tornado's Business Logic

The front-end code of the Tornado user interface has many functions implemented in advance. When a depositor opens the Tornado Cash webpage and clicks the deposit button, the front-end code's program will locally generate two random numbers K and r, then calculate the value of Cn = Hash(K, r), and pass Cn (which is the commitment in the figure below) into the Tornado contract, inserting it into the Merkle Tree recorded by the latter. In simple terms, K and r are equivalent to private keys. They are very important, and the system will prompt the user to save them properly. They will still be needed for withdrawals later.

(The encryptedNote here is optional, allowing users to encrypt the certificate K and r with a private key and store it on-chain to prevent forgetting.)

It is important to note that all of the above work occurs off-chain, meaning that neither the Tornado contract nor external observers know K and r. If K and r are leaked, it is akin to having the wallet's private key stolen.

When the Tornado contract receives the user's deposit and the submitted Cn = Hash(K, r), it inserts Cn into the bottom of the Merkle Tree as a new leaf node while updating the value of the root. Therefore, Cn is one-to-one associated with the user's deposit action, and the outside world can know which user corresponds to each Cn, who has deposited tokens into the mixer, and the deposit record Cn corresponding to each depositor.

In the withdrawal step, the withdrawer inputs the certificate/private key (the random numbers K and r generated during the deposit) into the front-end webpage. The Tornado Cash front-end code's program will use K and r, Cn = Hash(K, r), and the Merkle Proof corresponding to Cn as input parameters to generate a ZK Proof, proving that Cn is a deposit record that exists in the Merkle Tree, and that K and r are the certificates corresponding to Cn.

This step is equivalent to proving: I know the key corresponding to a deposit record on the Merkle Tree. When the ZK Proof is submitted to the Tornado contract, the aforementioned four parameters are hidden, and the outside world (including the Tornado contract) cannot know them, thus ensuring privacy.

Other parameters involved in generating the ZK Proof include: the root of the Merkle Tree in the Tornado contract at the time of withdrawal, a custom recipient address A, and an identifier nf to prevent replay attacks (which will be discussed later). These three parameters will be publicly published on-chain, and the outside world can know them, but it does not affect privacy.

One detail here is that when generating Cn during the deposit operation, two random numbers K and r are used instead of a single random number. This is because a single random number is not secure enough and has a certain probability of collision. For example, using a single random number may lead to two different depositors coincidentally using the same random number, resulting in a collision in the generated Cn.

As for the A in the figure above, it represents the address receiving the withdrawal, which the withdrawer fills in themselves. nf is an identifier to prevent replay attacks; its value nf = Hash(K), where K is one of the two random numbers used in generating Cn during the deposit (K and r). This way, nf is associated with Cn; in other words, each Cn has a corresponding nf, and the two are one-to-one associated.

Why prevent replay attacks? Due to the design characteristics of the mixer, when withdrawing, it does not know which leaf Cn in the Merkle Tree corresponds to the coins the user is taking, so it does not know the association between the withdrawer and which depositors, and thus does not know how many times the withdrawer has deposited. The withdrawer can exploit this feature to withdraw frequently, initiating replay attacks, repeatedly taking tokens from the mixer pool until the fund pool is drained.

Here, the nf identifier functions similarly to the transaction counter nonce that every Ethereum address has, both set to prevent a transaction from being replayed. When a withdrawal occurs, the withdrawer needs to submit an nf and check whether this nf has been used before (recorded): if it has, the withdrawal is invalid. If not, it means that nf has not been used, the withdrawal is valid, and the corresponding nf will be recorded. The next time someone submits this nf, the corresponding withdrawal action will be directly deemed invalid.

What if someone randomly generates an nf that the contract has not recorded? Of course, this is not possible because when the withdrawer generates the ZK Proof, they need to ensure nf = Hash(K), and the random number K is associated with the deposit record Cn. This means that nf is associated with a recorded deposit Cn. If someone randomly fabricates an nf, this nf will not match any deposits in the deposit records, making it impossible to successfully generate a valid ZK Proof, and subsequent operations cannot proceed, resulting in the withdrawal operation failing.

Some may also ask: Is it possible to do without nf? Since the withdrawer needs to submit a ZK proof when withdrawing to prove their association with a certain Cn, can't we just check whether the corresponding ZK Proof has been submitted to the chain each time a withdrawal occurs?

But in fact, doing so would be very costly because the Tornado Cash contract does not permanently store previously submitted ZK Proofs, as this would severely waste storage space. Instead of comparing each new ZK Proof submitted to the chain with existing proofs, it is more cost-effective to set a small identifier nf and store it permanently.

According to the code example of the withdrawal function, the required parameters and business logic are as follows:

The user submits the ZK Proof, nf (Nullifier Hash) = Hash(K), customizes a recipient address for the withdrawal, and the ZK Proof hides the values of Cn and K, r, preventing the outside world from obtaining and judging the user's identity. The recipient often fills in a clean new address, which does not leak personal information.

However, there is a small problem here: when users withdraw, they often use a newly created address to initiate the withdrawal transaction, and at this time, the new address does not have ETH to pay for gas fees. Therefore, when the withdrawal address initiates the withdrawal, it must explicitly declare a relayer to pay for the gas fees, and then the mixer contract will directly deduct a portion from the user's withdrawal to pay the relayer as a reward.

In summary, Tornado Cash can obscure the association between withdrawers and depositors. In a scenario with a large user base, it is like a crowded downtown area; once a criminal blends into the crowd, it becomes difficult for the police to track them. The withdrawal process requires the use of ZK-SNARK, and the hidden witness part contains key information about the withdrawer, which is the most critical point of the entire mixer. At present, Tornado may be one of the most ingenious application layer projects related to ZK.

References

https://etherscan.io/address/0xa160cdab225685da1d56aa342ad8841c3b53f291#codeTornado Contract source code

https://mirror.xyz/mazemax.eth/BTbTOrEKzGkc-XoDcFtLPfJPtQ1Mt96BZYsW83m33IUTornado.cash Comparison of old and new mechanisms

https://www.youtube.com/watch?v=Z0s4W3UBxM8AnonymousPayments

https://medium.com/taipei-ethereum-meetup/zkp-study-group-tornado-cash-fdbb84d44b93[ZKP Reading Group] Tornado Cash

Case analysis