The LMD Hash Family

The family of LMD hashes has multiple uses, from simple error-detection and duplicate file warnings to cryptographically secure hashes. The LMD family consists of LMD-2, 4, -5, and -6, in increasing order of “toughness”.

LMD6 is a totally revolutionary and incredibly secure cryptographic hash that is baked into our Karacell encryption system. Details of all of them, focusing mostly on LMD6, can be found here:

LMD Hash Family External Blog

and this includes explanations, proofs, algorithms, code and characterizations. Just like Karacell, LMD6 is based off of provable mathematics of insanely long-cycle iterators, meaning the robustness (ie, probability of hash collisions and like) can be well studied and understood. Note that the comment of “This post won’t make any sense unless you’ve read the previous one” can be taken with a pinch of salt – that’s more for the mathematical purist, but by all means, feel free to investigate!

This page is dedicated to the discussion and comments about the LMD-x family of hash functions.

One response

13 02 2013

The Birthday Attack Myth
Hashes may be much stronger (or weaker) than you think…

In cryptographic circles, it’s often said that a hash is at best as strong as the square root of the size of its state space. In other words, assume that we have a hash consisting of B bits. Then the number of possible hashes is at most H=2^B. The difficulty of cracking the hash is then no harder than O(H^(1/2)), or O(N), where N=H^(1/2). This is the fundamental assertion behind a birthday attack, which approximates the probability of finding 2 different messages which happen to have the same hash. (This might allow someone to “prove” that bogus credentials were legitimate, for instance.)

The first problem is that this attack only gives us 2 messages with the same hash, which is much less useful than the case in which one of those messages is the legitimate one. We have to hope that the hash collision is then actually exploitable, which it probably isn’t. At most, we might be able to create a denial-of-service attack.

The second problem is that this result ignores the laws of physics. What use is the discussion of computational complexity without the discussion of computers?

Cracking hashes using a birthday attack is fundamentally different than just trying a series of bogus messages in order to find one which has the same hash as the legitimate one: the former process involves every compute node talking to every other, whereas the latter is embarrassingly parallel. (We could execute either algo serially, but the object is to crack the hash as quickly as possible.)

And why so much communication? Well, we’ll end up with a mapping of messages to hashes across O(N) connected compute nodes, given 1 node for each hash sample. This means that every node must communicate with every other node, in order to find a pair containing the same hash. (The fact that there might be no such pair, or several of them, doesn’t change the difficulty much.)

Let’s take the most optimistic case: we have a quantum computer with lots of qubits at our disposal. We know from the birthday attack theory that if we generate O(N) bogus messages, then the expectation is that 1 pair of them will have the same hash. Grover’s Algo can search O(N) hashes in O(N^(1/2)) time. (We assume that all O(N) hashes are generated in parallel in order to create the database in the first place, which I think would take the same O(N^(1/2)) time.) We need to repeat this search across O(N) parallel quantum compute nodes, each one searching for the repetition of a single specific hash. The amount of additional time required to identify which computer, if any, found a matching pair is O(log(N)), which is negligible by comparison. Thus the time taken is O(N^(1/2)), or O(H^(1/4)). (This is much smaller than the computational complexity, which is hidden by virtue of having O(N) parallel quantum computers.) In other words, the birthday attack may one day be a gross overestimate of hash strength, but only in the face of currently unimaginable computing power.

Absent a quantum computer, let’s now assume that we have at our disposal the most thermodynamically efficient classical computer imaginable. As the checkered history of operating systems has proven, we know hardly anything about such a computer, but a few obvious properties come to mind.

First of all, it must be very dense in the sense that the number of compute units per unit volume be as large as possible. (Think of logic elements as molecules in a 3D-printed sphere, or an asteroid remodeled into a computer using a nanobot swarm, or an ultradense computer made of quark-degenerate matter.) So absent the higher dimensional physics of string theory, this suggests that we should be thinking in terms of 3D spherical topology. (Perhaps you’re imagining something akin to the Death Star, in which the compute nodes are arranged like the layers of an onion, with connections in 3D.) The actual shape of a given node is of some concern, but can only improve computational efficiency by a constant factor over boring cubical nodes, as relates to packing problems.

But there’s a problem: heat. We need to eject heat as fast as it’s produced. Unfortunately, the radiation rate of a sphere is proportional to its surface area — not its volume. This means that the number of compute elements can only grow as fast as the surface area. Should they grow any faster, we would then need to reduce the clock frequency proportionally, because dynamic power is proportional to frequency. (Granted, in a real computer, not all nodes would always operate at the same frequency, but the same argument applies to the average frequency.)

If arranged in a 3D sphere, the average communication latency between random nodes would be at least O(N^(1/3)). But on account of the radiation problem, the sphere needs to be large enough to have a surface area of O(N), which implies a volume of O(N^(3/2)). So the latency is actually O(N^(3/2)^(1/3)), or O(N^(1/2)) — equivalent to what it would be, were all the nodes located on the surface. Reducing the clock frequency produces the same result. In other words, the best we can do in the steady state limit is to arrange our compute nodes on the surface of a sphere. (This result is uncannily similar to the Holographic principle, particularly the part which asserts that the entropy of a black hole is proportional to its surface area rather than its volume.)

But because each node needs to compare its hash to O(N) other hashes which were received from as many nodes, it would then appear that the overall compute time (to say nothing of the complexity) is O(N^(4/3)), or O(H^(2/3)).

But wait! Although we don’t know which hash lives where, we do know that every node needs to communicate with every other, exactly once. We can facilitate this by constructing a Hamiltonian path of all the nodes. On each clock cycle, we can advance this path by 1 node, allowing each node to see each hash in series. The time taken is then just O(N), or indeed O(H^(1/2)). We no longer need to pay the O(N^(1/3)) light delay because neighbors are only talking to neighbors.

Unfortunately, this still doesn’t sync up with the physics, because we have a signal path whose length grows with the problem size. A few failed nodes would effectively disable the computation. So the actual compute time would be O(H^(1/2)) divided by the probability (asymptotically, zero!) that no disabling degree of failure occurs, to say nothing of clock skew dragging the entire computation rate to the switching speed of the slowest node.

Fortunately, we could do this serially instead. Imagine instead a single compute node which produces a hash in constant time. It then sets a bit in a bitmap of size O(H), corresponding to this hash. If the bit was already set, then a repeated hash has been found. It would take O(N) time to produce the hashes, and O(H^(1/3)) light delay to communicate with the bitmap, for an overall execution time of O(N^(5/3)), or O(H^(5/6)) — not much worse!

Either way, and especially in the second case, we need gargantuan storage. Realistically, we’ll always be stuck with much smaller storage space, as hashes are developed with such intractable requirements in mind. In turn, this implies that we’ll have to attack the problem in an embarrassingly parallel fashion, which may sound good, but for the fact that we need to regenerate O(N) hashes just to test a single hash against each of them. This implies an execution time of O(N^2), or O(H) — the square as bad as the birthday attack literature would have us believe.

Let me be clear. I’m not saying that O(H^(1/2)) is a gross underestimate of the strength of any particular hash. Hashes have other weaknesses which have nothing to do with birthday attacks. The point is simply that the birthday attack itself presents a computational complexity which is not representative of its physical complexity — not even proportional to it — in the case where the transistor count is negligible relative to N. This is in stark contrast to the case of naively trying passwords, whose physical complexity measured in node-clocks is essentially the same, whether the attack is done in parallel or serially.

At least as applies to hashes, perhaps this is all coming around to a more utilitarian notion of computational complexity, that being computational complexity given minimally sufficient storage.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: