ZeroTier is a "a smart ethernet switch for earth". It creates encrypted virtual L2 network which can be joined by any authorized device. The devices are identified by 5 byte strings, which are hashes of Curve25519 public keys.
The hash function used for identity derivation is not standard one like SHA256 - with only 2^40 possible values it would too easy to spoof someone identity. Instead, a custom proof of work function is used. This function is claimed to be memory-hard (i.e. requiring a lot of memory to compute) - this property would make it harder to crack it on GPU or custom ASICs.
I show that the proof-of-work function isn't in fact memory hard. For now, it is mostly theoretical result - the computation still takes long time. Most importantly, the full public keys are cached by devices and centralized root servers. Even when someone spoofs identity, he would still need much luck and collaboration with ZeroTier Inc, which runs the root servers. I have notified ZeroTier Inc about 3 months ago and they weren't too worried about the consequences of my 'attack' (which is an opinion I agree with), saying they will eventually redesign the identity derivation function during transistion to post-quantum ciphers.
The hash function
The hash function used for identity derivation is not specified anywhere, so I have extracted its description from ZeroTier code.
computeMemoryHardHash(publicKey: string) -> string
digest = sha512(publicKey) # 64 bytes
initialize Salsa20 stream with IV=digest[0:32] key=digest[32:64]
genmem = empty 2MB array
current = 64 zero bytes
# the genmem array is initialized with CBC-like scheme
for i=0..2000000/64: # loop 1
current = encrypt with salsa(current)
genmem[64*i:64*(i+1)] = current
for i=0..2000000/16: # loop 2
num1 = convert genmem[8*i:8*(i+1)] to big-endian integer
num2 = convert genmem[8*(i+1):8*(i+2)] to big-endian integer
idx1 = num1 % 8
idx2 = num2 % (2000000/8)
swap digest[8*idx1:8*(idx1+1)], genmem[8*idx2:8*(idx2+1)]
digest = encrypt with salsa(digest)
if digest[0] >= 17:
if generating identity, retry with different public key
if validating identity, abort
(this part makes it easier to validate identity, than to generate one - a defining characteristic of a proof-of-work function)
return digest[59:64] # last 5 bytes
We can see that computeMemoryHardHash
tries to shuffle memory around in a way that depends on results of previous computation. However, if we look at this code long enough, we can see that sometimes it miserably fails. Specifically, if during the last iteration of loop 2 idx1 == 7
, then the result (last 5 bytes of digest
) only depends on X=gemem[8*idx2:8*(idx2+1)]
. This happens, because Salsa is a stream cipher, so digest = encrypt with salsa(digest)
encrypts digest[59:64]
in a way than only depends on old digest[59:64]
.
At first glance, X
still can't be computed without lots of memory - it could be changed at any time by swap
in loop 2. However, it turns out that the most values of genmem
won't ever be modified by loop 2. The loop runs for 2000000/16
iterations and each time psuedorandomly selects one of 2000000/8
indexes to modify - so any particular index will be modified with probability (1-2000000/8)**(2000000/16) ≈ 39%
.
With this knowledge, we can create a sketch of memory-easy version of the hash function:
computeMemoryEasyHash(publicKey: string) -> string
digest = sha512(publicKey) # 64 bytes
initialize Salsa20 stream with IV=digest[0:32] key=digest[32:64]
current = 64 zero bytes
# the genmem array is initialized with CBC-like scheme
for i=0..sizeof genmem/64: # loop 1
current = encrypt with salsa(current)
# this is the same as last num1 with probability 61%
lastnum1 = convert current[48:56] to big-endian integer
# this is the same as last num2 with probability 61%
lastnum2 = convert current[56:64] to big-endian integer
lastidx1 = num1 % 8
lastidx2 = num2 % (2000000/8)
if lastidx1 == 7: # probability 12%
retry
initialize Salsa20 stream with IV=digest[0:32] key=digest[32:64]
# generate genmem again, now to fetch block that contains `genmem[8*lastidx2:8*(lastidx2+1)]`
current = 64 zero bytes
for i=0..lastidx2/8: # loop 1
current = encrypt with salsa(current)
# 61% that swappedvalue == genmem[8*lastidx2:8*(lastidx2+1)]
swappedvalue = current[lastidx2%8]
return swappedvalue[3:8]
Overall, the probability that computeMemoryEasyHash
returns correct results is about 1/35.
Suppose Eve wants to spoof identity of Alice. She needs to generate same identity as Alice, but with private/public key known by her ("preimage"). She will run computeMemoryEasyHash
on her GPU or ASIC. When the function returns the identity of Alice, Eve will compute the identity again using computeMemoryHardHash
(on CPU). If both functions return the same result, she has what she wants. Otherwise she just repeats the whole procedure.
computeMemoryEasyHash
is correct with probability 1/35, but Eve also needs to fullfil proof-of-work condition (digest[0] >= 17
), so she will need to call computeMemoryHardHash
about 35*(256/17) ≈ 527
times to find the preimage.
The whole operation requires 2^26*2^40*(256/17)*35 ≈ 2^74
Salsa20 operations. Even with ASIC acceleration (assuming 60Gops/s), the computation would take 10.000 years. Algorithmic improvements to computeMemoryEasyHash
are almost surely possible.
Quite messy proof of concept is available.