A heterogeneous computing experiment on $40 of hobbyist hardware: a working physical-noise random source with end-to-end NIST-style characterization (SP 800-90B + 90A), feeding a probabilistic-bit emulator and an Ising-style annealer for combinatorial optimization. Real noise — mostly mechanical, from the Brownian motion of a MEMS accelerometer's proof mass; partly thermal, from a TMP36 LSB — is debiased per-channel via von Neumann, used to seed an HMAC_DRBG for memory-rate output, then drives Multi-Try Metropolis annealing on Ising spin-glass and TSP QUBO problems with three pluggable matmul backends (NumPy / TFLite-CPU / Edge TPU). Same algorithmic class as commercial quantum-annealing and probabilistic-compute silicon.
Long-form writeup, with measured numbers and the architectural arguments: xylem-group.org/research/lopt/.
┌──────────────┐ CSV @ 100Hz ┌────────────────┐ bytes/p-bits ┌─────────┐
│ RedBoard │ ─────────────▶ │ entropy_bridge │ ────────────────▶ │ stdout │
│ (sensors) │ USB serial │ (this repo) │ iter API │ Coral │
└──────────────┘ ◀───────────── └────────────────┘ └─────────┘
LED commands
| Measurement | Value | Source |
|---|---|---|
| Per-channel min-entropy (accel raw LSB) | 0.94 bits/sample | make min-entropy, NIST SP 800-90B subset |
| Per-channel min-entropy (TMP36 LSB) | 0.15 bits/sample | same |
| Live VN clean-byte rate (multi-channel) | ~10 B/s | make calibrate-bridge |
| Von Neumann yield (live, vs i.i.d. ideal 25%) | 19.6% | autocorrelation tax |
| HMAC_DRBG SHA-256 throughput | memory-speed | make seed / make password |
| NIST SP 800-22 (simulator, 800 streams) | all 15 tests pass | mean p=0.4999, lowest pass-rate 98.62% |
| Edge TPU matmul win at N=1024 | 7.4× over V2, 10.3× over V1 | make coral-bench (spatial-conv) |
| Edge TPU matmul win at N=2048 | 28.9× over V2, 31.5× over V1 | same |
| End-to-end annealer at N=1024 | 1.94× faster wall-clock post spatial fix | make anneal-bench |
| Tests | 56/56 | make test |
Live Ising annealer — single-spin SA (blue) vs Multi-Try Metropolis (orange) on a 64-spin glass. MTM crashes to the ground state in ~300 steps while single-spin SA grinds along; both eventually arrive at E = −242. Yellow flashes mark cells that just flipped, stars on the energy trace mark new-best moments, the bottom strip shows rolling-window acceptance rate collapsing as the system cools.
Live TSP solver — 7-city Travelling Salesperson, encoded as a QUBO Hamiltonian, solved by Metropolis annealing on the entropy bridge's debiased bytes. Watch the polyline rearrange itself as constraints get satisfied (red dashed = invalid state, blue solid = valid permutation). Best tour found is overlaid on the gray ghost of the brute-force optimum; the run lands on optimal.
Conventional computers are deterministic sequencers: read instructions, mutate registers, branch. Every operation has a definite answer; randomness is something software simulates. The whole stack — transistors, gates, ALUs, caches, OS schedulers — is engineered to suppress noise so the answer is bit-exact.
An Ising machine inverts that contract. Its primitive isn't a logic gate, it's a probabilistic spin: a node whose state (±1) is sampled from a Boltzmann distribution governed by its couplings to its neighbors. You don't program an Ising machine, you encode your problem as a coupling matrix J, let the system relax toward thermal equilibrium, and read out the low-energy configuration.
A huge class of optimization problems can be written as Ising or QUBO Hamiltonians. Once your problem is in that form, an Ising machine doesn't search the solution space — it physically relaxes into a low-energy state, exploiting parallelism that's intrinsic to the physics rather than orchestrated by software.
| Conventional computer | Ising machine |
|---|---|
| Deterministic, bit-exact | Probabilistic, samples a distribution |
| Logic gates as primitive | Coupled probabilistic bits as primitive |
| Programmed with instructions | "Programmed" by setting couplings (J) |
| Runs an algorithm to search | Relaxes to a ground state |
| Suppresses thermal noise | Uses thermal noise as the search engine |
| Sequential at the core; parallelism added on top | Parallelism is the substrate |
| Energy is a side-effect | Energy is the objective |
The Multi-Try Metropolis algorithm in parallel_ising.py is a software
emulation of this physics. Each MTM step computes v = J @ s once — a
single matmul that simultaneously yields ΔE for every possible single-spin
flip — then samples a move from those N candidates. That's the operation
that maps cleanly onto custom silicon: a parallel matrix-vector product
over short integers, exactly what the Edge TPU, Extropic's TSU, D-Wave's
QPU, and Fujitsu's Digital Annealer all accelerate. Swap in real Ising
silicon and the rest of the pipeline is unchanged.
Ising-class hardware (quantum annealers, p-bit machines, optical Ising machines, FPGA-based digital annealers) is being commercially deployed or seriously researched in:
- Logistics and routing — vehicle routing, last-mile delivery, container loading, airline crew rostering. Volkswagen ran a public taxi-routing trial in Lisbon on a D-Wave; DENSO has done factory AGV routing; Recruit uses Fujitsu's Digital Annealer for warehouse picking-route optimization.
- Drug discovery and protein folding — conformation search reduces to finding low-energy states in a discrete configuration space. Menten AI, ProteinQure, and Polaris Quantum Biotech have published QUBO-formulated drug-target work.
- Portfolio optimization in finance — Markowitz portfolio selection with cardinality constraints is a hard combinatorial problem that maps to QUBO. Multiverse Computing, Goldman Sachs's quant research group, and Mitsubishi UFJ have run Ising-backed portfolio experiments.
- Lattice problems in cryptography — shortest vector / closest vector problems in lattices (the foundation of post-quantum crypto) have natural QUBO encodings. Used both offensively (analyzing PQC schemes' hardness) and defensively (validating parameter choices).
- Graph problems — max-cut, graph coloring, community detection in social networks, chip floor-planning (graph partitioning + placement), VLSI routing, FPGA congestion. EDA tooling experiments at Synopsys and Cadence.
- Machine learning — restricted Boltzmann machines were the historical motivator. Sparse-network training, hyperparameter search, and feature selection have all been formulated as QUBOs. Extropic's TSU and the resurgent stochastic-computing field target this directly.
- Materials science and chemistry — computing ground states of physical Hamiltonians (the original use case — these machines are simulating the systems they're inspired by). Lattice gauge theory, frustrated magnets, high-Tc superconductor candidates.
- Operations research generally — scheduling (job-shop, project, resource-constrained), set cover, knapsack variants, MAX-SAT — all reducible to QUBO. NEC, Hitachi, and Toshiba sell annealer-as-a-service products to enterprise OR teams.
The ceiling is much higher than current deployment because Ising-class hardware is still 5–10 years from commodity. Credible directions:
- Real-time embedded optimization. When the annealer is a small chip rather than a cloud service, you can put one in a robot, a drone, an autonomous vehicle, an AR/VR headset, a power-grid controller. Low-power, deterministic-latency optimization at the edge.
- Probabilistic AI accelerators. Sampling — drawing from learned distributions — is the bottleneck in diffusion models, Bayesian inference, and the kind of probabilistic reasoning that LLMs currently approximate token-by-token. A p-bit chip could do this natively.
- Combinatorial co-processors. The way GPUs became the de-facto matmul co-processor for ML, p-bit / Ising chips could become the de-facto combinatorial co-processor for scheduling, planning, and discrete decision-making in conventional software stacks.
- Climate and energy. Power-grid dispatch, EV charging coordination, district-heating scheduling, supply-chain decarbonization — all combinatorial, all increasingly time-critical.
- Personalized medicine. Treatment-plan optimization (which combination of drugs, dosages, timing) under patient-specific constraints is QUBO- shaped and currently solved with heuristics that don't scale.
A hands-on emulator that demonstrates the full pipeline end-to-end on $40 of hobbyist parts: real thermal-noise harvesting, real von-Neumann debiasing, a real Metropolis annealer solving a real (small) optimization problem, and a real heterogeneous-compute architecture where the matmul step is offloaded to an accelerator over TCP. Swap in actual Ising silicon (Extropic, D-Wave, Fujitsu DA) and the host code is unchanged — it's a backend change.
The point isn't to compete with industrial annealers. It's to make the architecture legible by building it yourself.
Working today:
-
Hardware: TMP36, MMA8452Q, RGB LED, button, pot — all wired on a SparkFun RedBoard.
-
Bridge: multi-channel entropy from the TMP36 LSB (thermal/electronic noise) plus three MMA8452Q raw-int LSBs (mechanical noise — Brownian motion of the MEMS proof mass). On healthy hardware, marginal H ≈ 1.0 bits/sample on each accel axis; the mechanical noise floor dominates the thermal one at room temperature. See
docs/NIST_SP800_22.mdfor the detailed per-channel breakdown and the live-hardware autocorrelation finding. -
DRBG layer (
orchestrator/drbg.py): HMAC_DRBG SHA-256 (NIST SP 800-90A, stdlib-only, CAVP-vector-verified) seeded from the bridge. Applications consume application-rate bytes while the seed retains its hardware physical-provenance guarantee.make seed/make passworduse it directly;parallel_ising/tsp/isingsolvers use it by default (--no-drbgfor raw-bridge testing). -
p-bit emulator with live knob bias and LED feedback.
-
Annealing demos: Ising spin-glass (
ising.py), TSP via QUBO (tsp.py), and Multi-Try Metropolis (parallel_ising.py) — each with brute-force optimum + side-by-side python-vs-bridge comparison plots. -
Coral Dev Board running Mendel, on the network, Edge TPU detected by PyCoral. All three heterogeneous backends shipped behind the same wire protocol — the host annealer offloads
J @ smatmuls to a TCP server on the Coral. Backend chosen at server startup. All three produce bit-identical output.Measured per-matmul compute time across problem size N (median of 200 matmuls, on-Coral, network excluded):
N V1 NumPy V2 TFLite-CPU V3 Edge TPU (spatial) V3 / V2 Winner 32 0.030 ms 0.048 ms 0.411 ms 0.12× V1 64 0.046 ms 0.036 ms 0.414 ms 0.09× V2 128 0.105 ms 0.052 ms 0.444 ms 0.12× V2 256 0.339 ms 0.119 ms 0.462 ms 0.26× V2 512 1.36 ms 0.665 ms 0.493 ms 1.35× V3 1024 5.45 ms 3.92 ms 0.528 ms 7.43× V3 2048 22.08 ms 20.20 ms 0.700 ms 28.9× V3 Three regimes visible in the curves:
- Small N (≤256): the per-step compute is so cheap that any framework overhead dominates. V1 NumPy wins at N=32 because it has no framework — it's just a plain matmul. V2 wins at N=64–256.
- Crossover at N≈512: V3 overtakes V2. The Edge TPU's flat ~0.5 ms overhead stops dominating once the matmul itself is large enough.
- Large N (≥1024): V3 dominates by a widening margin. V3 scales near-flat with N (the matmul fits in on-chip SRAM and bandwidth doesn't bottleneck); V2 scales O(N²) on the CPU. By N=2048 the gap is 29×.
Full benchmark JSON in
docs/scaling_spatial.json; reproduce withmake coral-benchafter deploying spatial-variant models (seedocs/RESEARCH_LOG.md).Acceleration history. Until 2026-05-11 the V3 line was 4–8× slower than what's shown above:
edgetpu_compilerv16 was leaving the Conv2D matmul on the ARM CPU and only mapping the surrounding Reshape ops. The "spatial" variant inbuild_tflite.pyreshapes s into an HxWxC feature map (H,W ≤ 16) instead of the degenerate (1,1,N) tensor the old "pointwise" variant used. With a real HxW convolution to compile, the partitioner places the entire graph on the Edge TPU — single subgraph, all on-chip. Seedocs/RESEARCH_LOG.mdfor the full story.End-to-end annealing wall-clock. Matmul-ms is what the silicon cares about; ms/Metropolis-step is what a user cares about.
make anneal-benchruns full MTM annealing on the workstation host with each backend wired in, and times the whole loop including network round-trip, J caching, accept/reject overhead. Median ms/step at 200 steps:N Host NumPy Coral V1 Coral V2 Coral V3 (spatial) 32 0.009 56.3 55.1 56.1 128 0.012 56.4 55.5 55.6 512 0.087 76.2 64.3 64.1 1024 0.315 121.5 90.7 73.6 2048 1.220 103.7 88.4 84.2 Two takeaways, both honest:
- A workstation host beats the Coral by ~233× end-to-end at N=1024 (0.32 ms/step vs 73.6 ms/step). The heterogeneous-compute story only makes sense when the host is itself slow — a microcontroller, a sensor node, an embedded SoC. On a Mac, the M-series CPU dwarfs anything the Coral can offer. The lopt architecture is built for the embedded case (RedBoard host, Coral accelerator); the workstation numbers exist as the upper bound.
- Network round-trip dominates compute below N=512. The SSH-tunneled wire takes ~55 ms regardless of backend, so V1/V2/V3 all collapse to a flat ~55 ms floor at small N. Above that the matmul-only ranking re-emerges: V3 takes the lead at N=512 and extends through N≥1024 — the same crossovers as the matmul-only plot, shifted up by network overhead (see the dotted V3-matmul-only reference line). Closing the gap with the dotted line is a question of moving J upload + matmul to a leaner protocol (gRPC, shared- memory, or batching k spin vectors per round-trip). The spatial-conv V3 update halved the V3 wall-clock at N=1024 (142.6 → 73.6 ms/step) by collapsing the matmul portion from 2.13 → 0.53 ms — the network floor is still there, but the compute on top of it is now flat.
A quantization-noise dividend. V3's int8 matmul introduces small approximation errors at the matmul boundary. In a deterministic linear algebra context that's a bug; in a stochastic optimizer it can be a feature. At N=512 V3 reached e_best=−5036 vs V1/V2's −5030; at N=1024 V3 reached e_best=−10799 vs V1/V2's −10637 — a 1.5% better solution. The quantization acts as an extra dithering term on top of the thermal noise, occasionally letting MTM escape a local minimum the bit-identical V1/V2 trajectories settle into. We're not claiming this is reliable across instances (n=4 paired runs is not statistics), but it's a striking-enough effect to flag — quantized accelerators may be especially well-suited to stochastic Ising solvers for reasons that have nothing to do with throughput.
Reproduce with
make anneal-bench(assumesmake coral-tunnelis up). -
Live matplotlib dashboards (
make sim-dashboard,make sim-tsp-dashboard) with flip-flash highlights, new-best stars, rolling-window acceptance strips, and an anneal-progress gauge. -
Recording mode (
make record-all) renders the dashboards to GIF (and MP4 ifffmpegis on PATH) for sharing. -
NIST SP 800-22 conformance harness (
tools/nist/, vendored sts-2.1.2 with a non-interactive driver). Parallel runner shards the capture acrosscpu_count - 1workers. On simulator bytes, all 15 tests pass at the proportion threshold across 800 streams × 1 Mbit (mean p-value 0.4999, lowest pass-rate 98.62% on Linear Complexity vs the 97.94% NIST threshold). Live-hardware capture in progress — see "Live capture infrastructure" below. Writeup indocs/NIST_SP800_22.md. -
NIST SP 800-90B min-entropy estimator (
orchestrator/min_entropy.py). Implements three of the ten 90B non-i.i.d. estimators — MCV (§6.3.1), binary Markov (§6.3.3), Compression / Maurer (§6.3.4). Per-channel run on healthy hardware: each accelerometer axis carriesH_min = 0.94 bits/sample(94% of theoretical max); the TMP36 LSB sits at 0.15. The seven 90B estimators not yet implemented can only lower the reported H_min further, so the result is a conservative subset of full 90B compliance, not a certification claim.make min-entropy. -
Live-hardware capture infrastructure for the multi-day SP 800-22 hardware run. Three layers of self-healing:
Layer Role When it kicks in L1 — nist_runner.py --capture-onlystreams clean bytes to disk always L2 — lopt_coral_capture.shwatchdogrestarts L1 on rate drop or process death every 2 min L3 — lopt_coral_supervisor.sh(cron)restarts L1+L2 if both died (e.g. Coral reboot) every 5 min Plus firmware v1.4: when the I²C bus shorts to GND or
accel.begin()fails, the firmware retries every 30s and emits a "# accel recovered" log line on transitions down→up. The MMA8452Q breakout has shown intermittent solder-bridge issues on this specific hardware; this layer papers over them without manual intervention.make coral-nist-deploy && make coral-nist-supervisor-install && make coral-nist-capture. -
46 unit tests, all hardware-free. Includes a NIST CAVP test vector for HMAC_DRBG SHA-256 — the implementation passes the reference output bit-for-bit.
firmware/entropy_streamer/— RedBoard firmware. Streams a 7-channel CSV over USB serial and acceptsLED r g bcommands from the host. I²C bus health checks, accel address fallback, button debouncing.orchestrator/verify_stream.py— sanity-checks the stream and reports per-channel entropy quality.orchestrator/entropy_bridge.py— von-Neumann debiasing to extract uniform random bits, p-bit sampling driven by the live knob position, RGB LED state feedback.SimulatedBridgeproduces realistic synthetic samples so the whole pipeline runs without hardware.orchestrator/ising.py— Metropolis-Hastings annealer for a random Ising spin-glass; consumes clean bits fromentropy_bridge, knob = annealing temperature, button = re-anneal spark, LED = state. Brute-force ground state at startup so you can see how close the annealing got.orchestrator/tsp.py— Travelling Salesperson encoded as a QUBO Hamiltonian and solved by Metropolis annealing on the bridge's bits. Brute-force optimal tour at startup. Includes--comparemode (python RNG vs bridge bits, side-by-side plot) and--livemode (animated city map with current and best tours).orchestrator/parallel_ising.py— single-spin SA vs Multi-Try Metropolis on a random spin-glass. MTM uses one matmul per step to evaluate ΔE for all N possible single-spin flips simultaneously. With--backend coral, the matmul ships to the Coral over TCP.--liveopens an animated matplotlib window with energy trace, spin-grid heatmaps, acceptance strip, and anneal gauge.coral/coral_server.py— runs on the Coral Dev Board. Listens on TCP port 5005, accepts(J, s)and returnsJ @ s. Three pluggable backends behind one wire protocol:NumpyBackend(V1, default),TFLiteBackend(V2,--model PATH), and Edge TPU (V3,--model PATH --edgetpu). The wire J is verified against the baked-in J on first load — deployment mistakes (wrong seed, stale model) print a warning.coral/coral_client.py— host-side client forparallel_ising.py's--backend coral. Caches J on the server (uploaded once per problem instance), then ships onlysper Metropolis step.coral/build_tflite.py— builds the V2/V3.tflitemodel with J baked in as a 1×1 Conv2D kernel. Float32 (--quantize none, default) for V2; int8 with adversarial calibration (--quantize int8) for V3.coral/Dockerfile.tfliteandcoral/Dockerfile.edgetpu— Python 3.11- TF 2.18 build environment, and a Debian-bullseye image with the
Coral apt-installed
edgetpu_compiler. Used bymake tflite-buildandmake edgetpu-compilerespectively.
- TF 2.18 build environment, and a Debian-bullseye image with the
Coral apt-installed
tools/coral_bench.py— runs on the Coral, times all three backends (NumPy / TFLite-CPU / Edge TPU) at N ∈ {32 … 1024}, writes results to JSON.make coral-benchdeploys, runs, and pulls the JSON back for plotting.orchestrator/drbg.py— HMAC_DRBG SHA-256 (NIST SP 800-90A §10.1.2), stdlib-only, CAVP-test-vector-verified bit-for-bit.BridgeDRBGwraps any entropy source (live or simulator) and auto-reseeds every 1 MiB. Used byparallel_ising/tsp/ising(default) and bymake seed/make password.orchestrator/min_entropy.py— three SP 800-90B §6.3 estimators (MCV, binary Markov, Compression / Maurer).make min-entropyfor per-channel live measurement;make sim-min-entropyfor the simulator. Reports min over enabled estimators per NIST convention.tools/nist_runner.py— drives the vendoredtools/nist/C suite (sts-2.1.2 with a non-interactive driver). Three modes: live capture- test in one shot (
make nist), capture-only (--capture-only PATH, for unattended multi-day runs), and test-only (--bin PATH, for running on a pre-captured .bin). Splits work across worker processes.
- test in one shot (
tools/calibrate_bridge.py— measures the live RedBoard's actual clean-bits/s rate over a fixed window, projects expected capture sizes for various wall-clock durations.make calibrate-bridge.tools/lopt_coral_capture.sh— backgrounded launcher with built-in rate-watchdog (Layer 2). Runs on the Coral.tools/lopt_coral_supervisor.sh— cron-installed restart layer (Layer 3). Runs on the Coral every 5 min. Restarts L1+L2 if both died (typical cause: Coral reboot from a USB power transient).
See docs/WIRING.md for pin map and breadboard diagrams.
| Part | Used as |
|---|---|
| SparkFun RedBoard | Primary entropy MCU |
| TMP36 temp sensor | Thermal entropy source |
| 10 kΩ potentiometer | Live bias-field control |
| MMA8452Q breakout (3.3V) | Kinetic entropy / shake spark |
| Common-cathode RGB LED + 3× 220Ω | Live state visualization |
| Momentary push button | Manual re-anneal trigger |
| Adafruit KB2040 | Future: 2nd entropy channel |
| Google Coral Dev Board | Heterogeneous matmul backend |
The bridge ships with a --simulate mode that synthesizes statistically
realistic sensor samples so you can play with the p-bit emulator and the
Ising annealer before wiring anything:
python3 -m venv .venv
.venv/bin/pip install pyserial numpy matplotlib pillow
make sim # p-bit emulator on the simulator
make sim-ising # Ising annealer on the simulator
make sim-tsp # 5-city TSP via QUBO annealing on the simulator
make sim-tsp-compare # same TSP, both python RNG and bridge bits + plot
make sim-dashboard # live Ising dashboard (energy + spin grids)
make sim-tsp-dashboard # live TSP dashboard (city map + tour + energy)
make sim-seed # 32 simulator-seeded HMAC_DRBG bytes (hex)
make sim-password # 24-char simulator-seeded password (a-zA-Z0-9)
make sim-min-entropy # per-channel SP 800-90B min-entropy on the simulator
make nist-quick # 12.5 MB NIST 800-22 run on simulator bytes (~30s)
make nist # full 100 MB NIST run on simulator bytes (~2 min)
make test # run the unit tests (no hardware needed)make record-dashboard # render docs/dashboard.gif (Ising)
make record-tsp # render docs/tsp.gif (TSP)
make record-all # bothGIFs default to ~2.5 MB at 18 fps / 80 dpi. For higher-quality MP4s,
install ffmpeg (brew install ffmpeg) and pass --save-mp4 path.mp4 to
either dashboard module directly.
brew install arduino-cli
arduino-cli core install arduino:avr
arduino-cli lib install "SparkFun MMA8452Q Accelerometer"
make upload # flash the RedBoard
make verify # capture 500 samples, print per-channel entropy stats
make pbit # p-bit emulator (turn the knob, watch the LED)
make pbit # — add `--drbg` for memory-speed pbits
make ising # Ising-glass annealer (knob = anneal temperature)
make tsp # 5-city TSP via QUBO on physical noise
make seed # 32 hardware-seeded HMAC_DRBG bytes (hex)
make password # 24-char hardware-seeded password
make min-entropy # per-channel SP 800-90B min-entropy, 60-second window
make calibrate-bridge # 5-min run, projects expected capture sizesHealthy make verify output (firmware v1.4, MMA8452Q in good order):
# lopt entropy_streamer v1.4 accel=1
# rows=500 window=4.99s rate=100.2Hz
# accel_g |a|_mean=1.005 (gravity, sensor live)
# temp p(1)=0.50 H=0.95 bits/sample VN_yield=46.6%
# ax_raw p(1)=0.48 H=1.00 bits/sample VN_yield=49.9%
# ay_raw p(1)=0.53 H=0.99 bits/sample VN_yield=49.8%
# az_raw p(1)=0.51 H=1.00 bits/sample VN_yield=49.9%
# combined: ~196 clean bits/s (theoretical, marginal-derived)
The "combined" line is the i.i.d. ceiling. Live yield runs lower because
of sample-to-sample autocorrelation — 78–80 clean bits/s on healthy
hardware. See docs/NIST_SP800_22.md for the autocorrelation breakdown
and the per-channel SP 800-90B numbers.
Run on the Coral so the laptop's free, with a three-layer self-healing supervisor:
# RedBoard's USB plugged into the Coral's USB-A port, /dev/ttyUSB0.
# `coral` resolves via /etc/hosts or DNS. mendel must be in dialout group.
make coral-nist-deploy # one-time: push entropy_bridge / nist_runner
# / launcher / supervisor to Coral
make coral-nist-supervisor-install # */5 cron — relaunches stack on Coral reboot
make coral-nist-capture # NIST_BYTES=1500000 (= 12 streams), ~38h
# any time:
make coral-nist-status # pid + log tail + .bin size
make coral-nist-pull # scp the .bin home to ./capture.bin
make nist-test NIST_BIN=capture.bin # run the 800-22 C suiteThe host annealer offloads J @ s matmuls to a NumPy-backed TCP server on
the Coral. Works with simulator entropy too — no RedBoard needed for this
demo.
# one-time: ensure `coral` resolves (e.g. /etc/hosts entry to the Coral's IP)
make coral-deploy # scp coral_server.py to ~mendel/
# in one shell:
make coral-server # starts the matmul server on the Coral
# in another shell:
make coral-parallel-ising # MTM annealer, n=32, 500 steps, matmul on CoralmacOS gotcha. On macOS Sequoia/Tahoe, Homebrew Python doesn't have the
Local Network privacy entitlement, so direct TCP connections to RFC1918
IPs get rejected with OSError: [Errno 65] No route to host even though
ssh and nc work fine. Workaround — tunnel the port over SSH so Python
connects to loopback (which isn't gated):
make coral-tunnel # 127.0.0.1:5005 -> coral:5005
.venv/bin/python -m orchestrator.parallel_ising \
--simulate --sim-seed 11 -n 32 --steps 500 --algorithm mtm \
--backend coral --coral-host 127.0.0.1
make coral-tunnel-downThe tunnel adds ~50ms per round-trip vs ~1–2ms on direct LAN; the matmul itself is ~0.1ms on the Coral's ARM CPU at n=32.
Four pieces:
- Entropy sources. Four channels per sample at 100 Hz: TMP36 LSB (thermal/electronic, low entropy at room temperature) plus the raw 12-bit MMA8452Q register LSBs on each of three axes (mechanical noise, ~0.94 bits H_min/sample on each axis under SP 800-90B).
- Whitening. Per-channel von-Neumann debiasing.
(0,1) → 0,(1,0) → 1, drop matches. Output is provably uniform given i.i.d. input. Per-channel pairing matters — cross-channel pairs would give non-uniform output because temp'spis wildly different from accel's. - Conditioning + expansion. HMAC_DRBG SHA-256 (NIST SP 800-90A §10.1.2). Pulls 64 bytes of seed material from the bridge once, then expands to memory-rate output via HMAC chains. Reseeds every 1 MiB.
- Biased sampling. Knob → sigmoid
p1. Samplebit = 1 if u01 < p1 else 0whereu01comes from 16 DRBG-output bits.
Result: knob left → mostly 0s, knob right → mostly 1s, knob centered → fair
coin. The first byte's lineage traces all the way back to Brownian motion
of a MEMS proof mass; subsequent bytes are HMAC-SHA256 expansions of that
seed. make pbit --drbg runs at memory speed; make pbit (raw) runs at
the bridge's ~10 B/s rate.
The deferred items below have a clear path. Each is independent — pick any order.
Previously, edgetpu_compiler v16 left the Conv2D matmul on the ARM
CPU and only mapped the surrounding Reshape ops. The fix was a model
reshape: lay s out as an HxWxC feature map (H,W ≤ 16) instead of a
degenerate (1,1,N) tensor, so the compiler sees a real spatial
convolution. Single subgraph, full on-chip placement at every N from
32 to 2048. See the benchmark table in the Three-backend stack
section above and docs/RESEARCH_LOG.md for the experimental
narrative.
Build chain:
make tflite-image # one-time
make tflite-build-int8 TFLITE_VARIANT=spatial \
TFLITE_N=256 # build N=256 int8
make edgetpu-image # one-time
make edgetpu-compile TFLITE_N=256 # compile for TPUedgetpu_compiler is x86_64 Linux only; on Apple Silicon it runs
under Rosetta emulation (--platform linux/amd64).
The pointwise variant is preserved (TFLITE_VARIANT=pointwise) for
reproducing the historical baseline.
The KB2040 is a 133 MHz RP2040 currently unused. With a second TMP36 wired to it, it can sustain a parallel ~1 kHz entropy stream on its own USB serial port. The host orchestrator opens both ports, XORs the streams together, and feeds the merged result to the existing von-Neumann debiaser. Result: 10× the bandwidth, and provably better uniformity (the Piling-up Lemma reduces bias quadratically when XORing independent streams).
Sketch in docs/WIRING.md §6. Roughly 60 lines of Arduino sketch on the
KB2040 side and a small XorBridge class on the host side.
Three of the ten 90B non-i.i.d. estimators are implemented (MCV, binary Markov, Compression / Maurer). The reported H_min on healthy hardware (0.94 bits/sample on each accel axis) is the min over those three; the seven not yet implemented (collision, t-tuple, LRS, multi-MCW family, lag, LZ78Y) can only ever lower H_min further. Adding them would either confirm the conservative bound or expose a structure the current subset misses. Plus IID-vs-non-IID detection per §3.1.2 to formally classify the source. NIST has reference C implementations; porting takes a focused day or two.
The 800-stream / 100 MB pass is on simulator bytes; the live capture
on the Coral is in flight at the time of writing this. When it
completes (~1.8 days, multi-day target), drop the result into
docs/NIST_SP800_22.md caveat 1 and remove
the hedge. The work to set this up — capture/test split, three-layer
supervisor, calibration tool — is shipped; we just need the bytes.
The V1/V2/V3 crossover plot above (V3 wins at N=1024) is achieved with the current partial Edge TPU mapping where only the Reshape ops run on silicon and the Conv2D matmul falls back to ARM CPU. Getting the Conv2D onto the silicon should push the V3 line down by another order of magnitude and move the crossover to roughly N=256. See "V3 acceleration gap" above for the avenues.
The current bench tops out at N=1024 because building the float32 .tflite for N=4096 would be ~64 MB and the on-chip Edge TPU memory caps the model size that fits without streaming. A few options to explore: enabling the off-chip parameter streaming path, sharding the matmul into blocks, or using int8-only weights throughout (which we already do for V3) so the model footprint scales as N² bytes rather than 4N² bytes.
MIT — see LICENSE.



