Scaling Bitcoin workshop : Stanford 2017
Flyclient super light clients for cryptocurrencies
Flyclient Super light client for cryptocurrencies
Benedikt Buenz, Mahdi Zamani
As you know, the blockchain is a chain because each block connects to the previous block through these hashes. In the header of each block is a commitment to all the transactions. It's using this merkle hash tree where at each level the parent node hashes to the children and this gives you some nice properties which I will talk about later. I need to check consistency though. The first thing I need to check is that the transactions don't spend more than they have... then I need to check that the tree was onstructed correctly. Finally, the blockheaders need to be correct and they need to link to each other and the proof-of-work requirement is that blockhashes to a number that starts with a bunch of zeroes. The way this works is that miners an choose this freely and try nonsense and to find one such that the nonce hits the match.
Now I can check what is my blockchain that I have received from soeomene is correct. What if I have two people saying there are two different blockchains? How will I know which one is the correct one? What is Alice going to do in this case? She can use the longest chain rule, which is only one of the rules of course. She can look for which chain has more PoW chain. And then she could assume it's right, the intuition there is that it costs more money to forge history, it's real resources, real energy to produce chains. To summarize this PoW conjecture, it says that honest mining is equilibrium, and in equilibrium the dominant strategy is to follow the rules of hte network. The other part of the conjeture is that the majority of nodes are rational, that the majority of nodes will mine honestly and follow the rules. This holds because it implies that the network will actually be honest and follow the rules of the network and it also has this nice ... property... that you can always distinguish an honest chain between non-honest chain, after being offline. So when you wake up, you download all the chains, and you can find the honest one by looking at the blockheaders and checking validity using all of the rules that was created using the rules of the network. This property does not necessarily hold for proof-of-stake, but that's a topic for another talk.
Let's switch gears and say let's download the blockchain on my mobile client. There's a ibg problem and it's why we're here. The blockchain grows and grows and it's now 150 gigabytes. If I have a mobile client, how am I going to store and verify 150 gigabytes of data with pruning? Seems infeasible. Satoshi was aware of this and he came up with this idea of a simple payment verification client (SPV). It throws aways the transactions, stores the blockheaders and it requires fraud proofs which we don't have yet. It can verify PoW and blockheaders. It can check chain length and cumulative PoW on different chains. If the PoW conjecture holds, then the SPV clcient should be able to download the best chain. The SPV client can't verify all the transactions and this is okay because we assume that the longest chain is honest and that all the rules were followed.
You can give a merkle inclusion proof to show that a transaction was in a block by using the merkle root of the transaction set tree. The SPV clients have several properties and some problems. The nice thing about them is that they do not grow with the number of transactions. They still grow with the number of blocks. You need all the blockheaders. This isn't a problem with bitcoin because each blockheader is just 80 bytes, and there aren't many, only produced once every 10 minutes. For SPV clients.. 2.2 gigabytes. Really large already, too large for many phones. This is especially bad if you want to have multi-chain clients or clients that work on multiple sidechains or whatever. So this grows expensively and it's not a workable solution.
So can we build a light client that is sublinear and doesn't need to download all the blockheaders. There are some generic solutions, like using zero-knowledge proofs or a zk-SNARK whih is completely impractical here. There are ohter solutions. There are non-interactive proofs of PoWs. They are NiPoPoWs. Based on Kialyias and Lamprou, Stouka 16. If you have a hash of x that starts with a bunch of zeroes, say 70 zeroes, to find one of them on average you will find 2 hashes that have 69 zeroes, and I will find 4 hashes on average that will have 68 zeroes, and so on and so forth. So if I have a specific PoW target, say like 66 or 70 zeroes, then the best proof-of-work or the best quality of proof-of-work in the chain is a really good indicator for how much work is in the chain in general. This beautiful idea is used in NiPoPoWs and you can use a skip list to point to high quality PoW proofs and then you get a blockchain where you... rather small. Really small, like log(n) times log(n)... These high quality blocks are really important for NiPoPoWs. You get a regular reward for including this in the chain. What if I could bribe a rational miner-- I would tell him on the main chain, don't include these really high value blocks. I'll pay you twice the block reward if you don't include them. I'll give you the money. Throw them away. The main chain looks worse and I can easily fake a chain that looks better than the main chain even if I don't have as much mining power as the main chain. I bribe the miners to make the chain look worse, and it wasn't that expensive. It doesn't violate the NiPoPoWs proof because the assumption was that the main chain had honest miners. But what if they were willing to be bribed? So that's an attack. We need another NiPoPoWs without high quality blocks perhaps.
One of the tools we want to use are these merkle mountain ranges or these merkle trees... it's basically just a merkle tree idea from petertodd where you can build on, so you can extend it, you can append to it, it keeps growing. One of the nice properties is that if I have access to this root, I can chek that this tree is a subtree of this tree. or I can check that this tree is a subtree of this other one. I can use these roots. There is a logarithmic sized proof that one tree was built on the other tree.
How does flyclient work? As we recall, every blockchain has this previous hash in the blockheader. Well, what if we don't include jus tthe previous block, but what if we include a root of a merkle tree and the merkle tree commits to all the... So in every blockheader, you have a reference to all the nodes in the blockchain. You can easily do lookups to say, what's the block number-- 13, was 13 included in this merkle tree? You can do a logarithmic sized proof that the block was indeed included.
The nice thing is that the client only needs to store the head of the chain. If I want to prove to you that a transaction was included, I give you a merkle proof that the block was part of the chian and then I give you another merkle proof that the transactnction was part of the block.
What if I get two chains? Which one is the right one? We should ask the... just sample some random blocks, like give me k different blocks. We're going to ask for k different blocks. And then, get a merkle inclusion proof. What do we know? We know-- say the chains claim to have the same length. The malicious chain is going to have to have a lot of holes. It can't be as long as the honest chain. It has, by assumption, less mining power. In the same time it took to create the honest chain, the malicious chain would not have had enough blocks created. So some of the blocks are going to be invalid (from a PoW perspective). This assumes that the honest chain has majority PoW. We have a reasonably high probability of catching a malicious chain. We need to sample only 80 blocks to get a really high probability of finding it.
What about forking? What happens if the malicious chain is actually a fork of the main chain? For most of the chain it matches the honest chain. At the end it has a couple blocks say. In this fork, the same property must hold. Only a third of these blocks can be there. But the problem is that we might not atch them. Most of the samples would be here. Maybe the prover gets lucky on the few samples we have here. What's important is that if we knew where it forked off, then that would be sufficient. Then we would just sample in this area and then again we only have to sample a constant number of blocks. So that suffices to determine whether it's na honest chain or a malicious fork.
How can we find the fork point? Let's have another strawman argument. The two provers here could and the verifier could use an interactive binary search. So they first look at the first and last block, then the middle block. This takes logarithmic number of interactions and then they find it. Every time you use these merkle proofs to check that this is actually a block from this chain- but if we then know this fork point, then we can very easily do, we just check the constant number of blocks afterwards. This works. The only problem is that it requires the two provers and the verifier to interact, and we really don't want that. Why would a prover be willing to do that? It just seems cumbersome and yeah, it's a problem. It's not nice. So let's try to do something non-interactive.
We can use a similar idea where we just have to find the forking point. What if we can say, I don't know where the forking point is, but I know it's after some value. I know the fork must have happened after a certain number of blocks. It's in the last 10 blocks or last half million blocks. Well, so what can we do here? Here comes the key insight. We can sample enough blocks such that at least say 2/3rds of them would have to be created honestly to pass the check. If the miner only has 1/3rd of... if we know that in the honest chain we know that the miner can only create 1/3rd of the blocks, this gives us a bound on when the fork must have happened. We can calculate the min fork point by saying, well if it passes the 2/3rd check. The fork must have happened after somewhere after the half-way point. So step 3 is to rinse and repeat. We get 3/4s, then 7/8s, and finally we just check a couple of blocks at the end to prevent these short forks from breaking this shceme. You just check a constant number of last blocks such that the malicious prover doesn't have a chance of being just lucky. So this is the key idea.
We have to check a constant number of blocks in each interval. It doesn't depend on how wide is the interval. It doesn't matter. Always check 81 blocks. The number of blocks that we check is dependent on how strong we assume the attacker is. We have to check log(n) intervals. For each lbock, we have to do log(n) merkle inclusion proofs, which is also logarithmic in the number of blocks. Overall this is complexity of log(n^2). To sync up, I only need to download 3 megabytes, and then I can throw it away later.
This is cool, but there's a problem. This is still interactive. The verifier has to ask for a certain number of blocks. The verifier asks for random blocks. There's no specific maintenance. He just says give me some random blocks. These public coin protocols-- I'll talk about this more in the afternoon. You can turn these into non-interactive protocols by using a hash function to get the randomness. In this specific case, we could get the randomness from the block headers. You use a slightly different hash function-- use sha3 instead of sha256 maybe. To hash the block. And then you get some random bits from that. And there's a paper on how exactly to get those beacons. And this is good enough to request these random blocks and it turns out that because creating multiple ... is difficult.. you don't need this 2^(-128)... You don't need as much as you normally need. It's quite difficult to do those multiple hashes.
As a prover, if it's non-interactive, I can create a proof for a blockheight once, and then other people can forward it. There's no interaction anymore. You know. You turn it on, you have some nice server that gives your mobile client the header for the chain and an SPV proof, and then you have multiple services-- you have to make sure you get the same data of course, you were assuming the server was honest. And then you check which one is right, it's hard for them to fool you.
Kaiyas et al.
Q: ... similar things.. right?
A: Non-inclusion proofs are very difficult. Nonmembership proofs require an accumulator or something. There are some ideas with bloom filters, but in general, it's not easy. You would have to check every block for even if you have a normal... it's not really possible.
Q: The challenging question.. if you are going to rely on any... then.. the question we should ask is, can we... provide a proof.. that it is valid.. then we don't rely on... rely on the... servers? Non-inclusion. Then we can... use it for inclusion.
A: You could, but I don't think you have to, since we have nice cryptographic solutions.
Q: What if someone tries to attack a speciifc SPV node by continuously forking? A continuous forks to try to get the next block to be a malicious one for that SPV client. And after that the real block is found...
A: The same thing for.. you shouldn't trust transactions that aren't deep in the blockchain. You can always do a 3 block depth fork, if you have some mining power. It's going to be harder to do that at 10 blocks deep, though. In the protocol, you manually check the last constant number of blocks to prevent that situation as well. That's possible, yeah. That's why you can't trust transactions that are only 3 blocks deep.
Q: How would this affect bandwidth? If cell phones were doing fraud proofs or trying to figure out where the forking block is.. how would that work on 2G, 3G, bad reception cell phone? Are you still here? Okay, this one went out into the aether I guess.
Next time we will be doing Scaling WiFi and Scaling HDMI.