ZeroTier is a "a smart ethernet switch for earth". It creates encrypted virtual L2 network which can be joined by any authorized device. The devices are identified by 5 byte strings, which are hashes of Curve25519 public keys.
The hash function used for identity derivation is not standard one like SHA256 - with only 2^40 possible values it would too easy to spoof someone identity. Instead, a custom proof of work function is used. This function is claimed to be memory-hard (i.e. requiring a lot of memory to compute) - this property would make it harder to crack it on GPU or custom ASICs.
I show that the proof-of-work function isn't in fact memory hard. For now, it is mostly theoretical result - the computation still takes long time. Most importantly, the full public keys are cached by devices and centralized root servers. Even when someone spoofs identity, he would still need much luck and collaboration with ZeroTier Inc, which runs the root servers. I have notified ZeroTier Inc about 3 months ago and they weren't too worried about the consequences of my 'attack' (which is an opinion I agree with), saying they will eventually redesign the identity derivation function during transistion to post-quantum ciphers.
The hash function
The hash function used for identity derivation is not specified anywhere, so I have extracted its description from ZeroTier code.
computeMemoryHardHash(publicKey: string) -> string digest = sha512(publicKey) # 64 bytes initialize Salsa20 stream with IV=digest[0:32] key=digest[32:64] genmem = empty 2MB array current = 64 zero bytes # the genmem array is initialized with CBC-like scheme for i=0..2000000/64: # loop 1 current = encrypt with salsa(current) genmem[64*i:64*(i+1)] = current for i=0..2000000/16: # loop 2 num1 = convert genmem[8*i:8*(i+1)] to big-endian integer num2 = convert genmem[8*(i+1):8*(i+2)] to big-endian integer idx1 = num1 % 8 idx2 = num2 % (2000000/8) swap digest[8*idx1:8*(idx1+1)], genmem[8*idx2:8*(idx2+1)] digest = encrypt with salsa(digest) if digest >= 17: if generating identity, retry with different public key if validating identity, abort (this part makes it easier to validate identity, than to generate one - a defining characteristic of a proof-of-work function) return digest[59:64] # last 5 bytes
We can see that
computeMemoryHardHash tries to shuffle memory around in a way that depends on results of previous computation. However, if we look at this code long enough, we can see that sometimes it miserably fails. Specifically, if during the last iteration of loop 2
idx1 == 7, then the result (last 5 bytes of
digest) only depends on
X=gemem[8*idx2:8*(idx2+1)]. This happens, because Salsa is a stream cipher, so
digest = encrypt with salsa(digest) encrypts
digest[59:64] in a way than only depends on old
At first glance,
X still can't be computed without lots of memory - it could be changed at any time by
swap in loop 2. However, it turns out that the most values of
genmem won't ever be modified by loop 2. The loop runs for
2000000/16 iterations and each time psuedorandomly selects one of
2000000/8 indexes to modify - so any particular index will be modified with probability
(1-2000000/8)**(2000000/16) ≈ 39%.
With this knowledge, we can create a sketch of memory-easy version of the hash function:
computeMemoryEasyHash(publicKey: string) -> string digest = sha512(publicKey) # 64 bytes initialize Salsa20 stream with IV=digest[0:32] key=digest[32:64] current = 64 zero bytes # the genmem array is initialized with CBC-like scheme for i=0..sizeof genmem/64: # loop 1 current = encrypt with salsa(current) # this is the same as last num1 with probability 61% lastnum1 = convert current[48:56] to big-endian integer # this is the same as last num2 with probability 61% lastnum2 = convert current[56:64] to big-endian integer lastidx1 = num1 % 8 lastidx2 = num2 % (2000000/8) if lastidx1 == 7: # probability 12% retry initialize Salsa20 stream with IV=digest[0:32] key=digest[32:64] # generate genmem again, now to fetch block that contains `genmem[8*lastidx2:8*(lastidx2+1)]` current = 64 zero bytes for i=0..lastidx2/8: # loop 1 current = encrypt with salsa(current) # 61% that swappedvalue == genmem[8*lastidx2:8*(lastidx2+1)] swappedvalue = current[lastidx2%8] return swappedvalue[3:8]
Overall, the probability that
computeMemoryEasyHash returns correct results is about 1/35.
Suppose Eve wants to spoof identity of Alice. She needs to generate same identity as Alice, but with private/public key known by her ("preimage"). She will run
computeMemoryEasyHash on her GPU or ASIC. When the function returns the identity of Alice, Eve will compute the identity again using
computeMemoryHardHash (on CPU). If both functions return the same result, she has what she wants. Otherwise she just repeats the whole procedure.
computeMemoryEasyHash is correct with probability 1/35, but Eve also needs to fullfil proof-of-work condition (
digest >= 17), so she will need to call
35*(256/17) ≈ 527 times to find the preimage.
The whole operation requires
2^26*2^40*(256/17)*35 ≈ 2^74 Salsa20 operations. Even with ASIC acceleration (assuming 60Gops/s), the computation would take 10.000 years. Algorithmic improvements to
computeMemoryEasyHash are almost surely possible.
Quite messy proof of concept is available.