Collision margin — Merkle-Damgård vs Bao tree

The two algorithms target the same 128-bit collision security but achieve it via different constructions. Here is what changes structurally and why it matters for protocol design.

Source: compare-collision-margin.md Type: structural / data-only

1. SHA-256: Merkle-Damgård in 30 seconds

H(M) = compress(... compress(IV, m1) ..., mn)

Pad the message; split into 512-bit blocks; thread a 256-bit chaining value through compress block by block; output the final state. Properties:

  • Length extension: given H(M) and len(M) an attacker can compute H(M || pad || M') without knowing M.
  • Multi-collision (Joux 2004): a single chain compromise enables 2k collisions on a k-stage cascade.
  • Single thread of evaluation: cannot be parallelised across input blocks.
  • Bitcoin's mitigation: SHA-256d. The outer hash absorbs exactly 32 bytes; an attacker would have to invert the inner hash to extend the outer chain, which is itself a preimage attack.

2. BLAKE3: Bao tree in 30 seconds

H(M) = root( hash_chunk(c1), hash_chunk(c2), ..., hash_chunk(cn) )

Split M into 1 KiB chunks; hash each chunk with the CHUNK_* domain flags set; combine pairs with the PARENT flag set; the final root has the ROOT flag set. Domain separation is built into the compression function via flag bits including KEYED_HASH, DERIVE_KEY_CONTEXT, DERIVE_KEY_MATERIAL.

Properties:

  • No length extension: tree mode does not expose a resumable chaining value.
  • Multi-target resistant: keyed mode uses the key as IV.
  • Parallelisable: each chunk and parent computation is independent. SIMD lanes can compute 4 / 8 / 16 chunks at once.
  • Streamable: tree can be computed incrementally with O(log n) state.

3. Side-by-side

PropertySHA-256 (MD)BLAKE3 (Bao)
Block size64 bytes64 bytes (compression)
Chunk sizen/a1024 bytes
Output size32 bytes32 bytes (or any XOF len)
Length-extension by defaultyesno
Multi-collision tractableyes (Joux)no
Parallelisablenoyes
Keyed (MAC) mode built-inno (use HMAC)yes (keyed_hash)
KDF mode built-inno (use HKDF)yes (derive_key)
Variable output length (XOF)noyes

4. Implications for B3Chain protocol design

  1. Block / tx ID hashing. SHA-256d unchanged. The MD construction does not leak via length extension because the inner hash normalises the input.
  2. PoW hashing. BLAKE3d. Same shape as SHA-256d but using BLAKE3's tree construction. The double-hash is technically redundant for length-extension (BLAKE3 doesn't have it) but kept for symmetry with the Bitcoin protocol layout and as defence-in-depth.
  3. Future MAC / KDF. When B3Chain needs a keyed primitive (e.g. peer-id derivation, BIP324 negotiation, Lightning HTLC secrets), use BLAKE3's built-in keyed_hash and derive_key rather than re-implementing HMAC/HKDF.

5. What this comparison is NOT

  • A claim that SHA-256 the algorithm is broken. It is not.
  • A claim that BLAKE3 collisions are easier to find. They are not (published cryptanalysis margin is comparable; see attack surface).
  • A claim that "tree hashes are always better". They are better for the use cases above.

6. Sources

Damgård 1989, Merkle 1989; Joux 2004 multi-collisions; BLAKE3 specification; Bao tree mode; Bitcoin SHA-256d rationale (Satoshi mailing list). Full citations in compare-collision-margin.md.