1. The differential approach
Instead of trying to test every CPU feature combination directly (an intractable matrix), the audit treats two implementations as oracles of one another:
same input bytes
| |
SIMD portable Python reference
| |
hash A hash B
| |
+-- == --+
^
if not equal, FAIL (with size + first 16 bytes of diff)
The portable reference is a direct port of the BLAKE3 paper's pseudocode. It is intentionally slow (no batching, no SIMD) and is used purely as a correctness oracle.
2. Input sizes tested
| Bucket | Sizes | Why |
|---|---|---|
| Edge sizes | 0, 1, 31, 32, 33, 63, 64, 65, 80, 127, 128, 129, 255, 256, 257, 511, 512, 513, 1023, 1024, 1025, 2047, 2048, 2049, 4095, 4096, 4097, 8191, 8192, 8193, 16383, 16384, 16385, 65535, 65536, 65537, 100000 |
Catch bugs at chunk / block / lane boundaries (BLAKE3 chunk = 1024, block = 64). |
| Random fuzz | 1 000 inputs, sizes uniform in [0, 100 000] |
Catch bugs that don't sit on neat boundaries; deterministic seed for reproducibility. |
| Spec sanity | BLAKE3("") = af1349b9... |
Anchor against the spec constant. |
| Determinism | Re-hash the same input twice | Catches non-deterministic SIMD intrinsics (rare but real). |
3. How to run
cd b3chain pip3 install blake3 python3 contrib/testing/audit/audit-simd-blake3.py
Runtime is dominated by the pure-Python reference; about 40 seconds on a modern laptop.
4. Expected output
[B-1] SIMD BLAKE3 vs portable-C reference (differential test)
========================================================================
PASS [B-1] BLAKE3 of empty input matches the spec constant
PASS [B-1] 37 edge sizes match between SIMD and portable reference
Running 1000 random fuzz inputs (seed=0xb3c0010d)...
200/1000
400/1000
600/1000
800/1000
1000/1000
PASS [B-1] 1000 random fuzz inputs match between SIMD and portable reference
PASS [B-1] SIMD library is deterministic across repeated calls
------------------------------------------------------------------------
4/4 checks passed in 38.8s
AUDIT RESULT: PASS [B-1]
5. Limitations
- The audit runs on whichever CPU you use. To cover SSE2, SSE4.1, AVX2, AVX-512, and NEON we will eventually wire it into a CI matrix. Today it tests the implementation that the BLAKE3 library picked for the host CPU.
- It does not test the C build flags themselves (e.g.
-mavx2). For reproducible-build verification, see the upcoming reproducible build documentation.
6. Source files
- contrib/testing/audit/audit-simd-blake3.py
- src/crypto/blake3/ — BLAKE3 reference + SIMD code
- BLAKE3-team/BLAKE3-specs — the upstream spec
The problem in one sentence
SIMD code paths and portable C code paths for the same algorithm must produce byte-identical output, and a single off-by-one in lane loading silently corrupts the chain forever.
The theory
The official BLAKE3 C library has multiple compression-loop implementations:
- Portable C (always built; reference behaviour)
- SSE2 (Intel/AMD baseline x86_64, 4 lanes)
- SSE4.1 (4 lanes, faster)
- AVX2 (8 lanes)
- AVX-512 (16 lanes)
- NEON (ARM)
A runtime dispatcher picks the best implementation for the host CPU. If any of these has a bug — a lane swap, a misaligned load, a missing final round — some nodes will produce different hashes, and the network will fork along CPU type. This has happened in real chains (notably an Ethereum SECP256K1 SIMD bug in 2017 that nearly forked the network).
Hands-on demo
python3 contrib/testing/audit/audit-simd-blake3.py
The script:
- Picks a list of edge-case input sizes: 0, 1, 31, 32, 33, 63, 64, 65,
80 (block-header), 127, 128, 129, 255, 256, 257, 1023, 1024, 1025, 4096, 8192, 65535, 65536, 100000.
- Generates 1000 additional random-sized inputs from a fixed seed
(so the run is reproducible).
- Hashes every input with BLAKE3 in:
- default mode (whatever SIMD the dispatcher picks)
BLAKE3_NO_SIMD=1portable mode
- Compares byte-for-byte. Any mismatch is an immediate FAIL.
Exercise
This one needs you to break the BLAKE3 C library, which lives under src/crypto/blake3/c/. As a non-destructive demonstration, you can temporarily set the env var that disables SIMD and confirm the audit still passes (because it's comparing portable to portable):
BLAKE3_NO_SIMD=1 python3 contrib/testing/audit/audit-simd-blake3.py # Should still PASS — both sides are now portable.
To actually demonstrate a SIMD bug detection, the more realistic exercise is: build with -DBLAKE3_DEBUG_FORCE_SIMD_OUTPUT_MISMATCH (if the library is patched to support it) and observe the differential fail. In production, this exercise is what catches real upstream regressions when we sync the BLAKE3 library to a newer upstream release.
Why we re-test on every release
The BLAKE3 C library is vendored at a specific commit. Every time we bump that commit, we re-run this audit before tagging a release. A SIMD regression upstream (rare but real — see CVE-2023-28447 for a historical example) must be caught by us, not by the network.
Further reading
- BLAKE3 specification: github.com/BLAKE3-team/BLAKE3-specs
- BLAKE3 C library: github.com/BLAKE3-team/BLAKE3/tree/master/c
- CVE-2023-28447 — historical AVX-512 buffer-handling bug, fixed in
BLAKE3 v1.4.1.
- Ethereum 2017 SECP256K1 fork-near-miss writeup:
ethereum.org/en/history/