lopt

A heterogeneous computing experiment on $40 of hobbyist hardware: a working physical-noise random source with end-to-end NIST-style characterization (SP 800-90B + 90A), feeding a probabilistic-bit emulator and an Ising-style annealer for combinatorial optimization. Real noise — mostly mechanical, from the Brownian motion of a MEMS accelerometer's proof mass; partly thermal, from a TMP36 LSB — is debiased per-channel via von Neumann, used to seed an HMAC_DRBG for memory-rate output, then drives Multi-Try Metropolis annealing on Ising spin-glass and TSP QUBO problems with three pluggable matmul backends (NumPy / TFLite-CPU / Edge TPU). Same algorithmic class as commercial quantum-annealing and probabilistic-compute silicon.

Long-form writeup, with measured numbers and the architectural arguments: xylem-group.org/research/lopt/.

  ┌──────────────┐  CSV @ 100Hz   ┌────────────────┐    bytes/p-bits   ┌─────────┐
  │  RedBoard    │ ─────────────▶ │  entropy_bridge │ ────────────────▶ │ stdout  │
  │  (sensors)   │   USB serial   │   (this repo)   │     iter API      │  Coral  │
  └──────────────┘ ◀───────────── └────────────────┘                    └─────────┘
                    LED commands

Headline numbers

Measurement	Value	Source
Per-channel min-entropy (accel raw LSB)	0.94 bits/sample	`make min-entropy`, NIST SP 800-90B subset
Per-channel min-entropy (TMP36 LSB)	0.15 bits/sample	same
Live VN clean-byte rate (multi-channel)	~10 B/s	`make calibrate-bridge`
Von Neumann yield (live, vs i.i.d. ideal 25%)	19.6%	autocorrelation tax
HMAC_DRBG SHA-256 throughput	memory-speed	`make seed` / `make password`
NIST SP 800-22 (simulator, 800 streams)	all 15 tests pass	mean p=0.4999, lowest pass-rate 98.62%
Edge TPU matmul win at N=1024	7.4× over V2, 10.3× over V1	`make coral-bench` (spatial-conv)
Edge TPU matmul win at N=2048	28.9× over V2, 31.5× over V1	same
End-to-end annealer at N=1024	1.94× faster wall-clock post spatial fix	`make anneal-bench`
Tests	56/56	`make test`

See it run

Live Ising annealer — single-spin SA (blue) vs Multi-Try Metropolis (orange) on a 64-spin glass. MTM crashes to the ground state in ~300 steps while single-spin SA grinds along; both eventually arrive at E = −242. Yellow flashes mark cells that just flipped, stars on the energy trace mark new-best moments, the bottom strip shows rolling-window acceptance rate collapsing as the system cools.

Live TSP solver — 7-city Travelling Salesperson, encoded as a QUBO Hamiltonian, solved by Metropolis annealing on the entropy bridge's debiased bytes. Watch the polyline rearrange itself as constraints get satisfied (red dashed = invalid state, blue solid = valid permutation). Best tour found is overlaid on the gray ghost of the brute-force optimum; the run lands on optimal.

Why this is interesting

Conventional computers are deterministic sequencers: read instructions, mutate registers, branch. Every operation has a definite answer; randomness is something software simulates. The whole stack — transistors, gates, ALUs, caches, OS schedulers — is engineered to suppress noise so the answer is bit-exact.

An Ising machine inverts that contract. Its primitive isn't a logic gate, it's a probabilistic spin: a node whose state (±1) is sampled from a Boltzmann distribution governed by its couplings to its neighbors. You don't program an Ising machine, you encode your problem as a coupling matrix J, let the system relax toward thermal equilibrium, and read out the low-energy configuration.

A huge class of optimization problems can be written as Ising or QUBO Hamiltonians. Once your problem is in that form, an Ising machine doesn't search the solution space — it physically relaxes into a low-energy state, exploiting parallelism that's intrinsic to the physics rather than orchestrated by software.

Conventional computer	Ising machine
Deterministic, bit-exact	Probabilistic, samples a distribution
Logic gates as primitive	Coupled probabilistic bits as primitive
Programmed with instructions	"Programmed" by setting couplings (J)
Runs an algorithm to search	Relaxes to a ground state
Suppresses thermal noise	Uses thermal noise as the search engine
Sequential at the core; parallelism added on top	Parallelism is the substrate
Energy is a side-effect	Energy is the objective

The Multi-Try Metropolis algorithm in parallel_ising.py is a software emulation of this physics. Each MTM step computes v = J @ s once — a single matmul that simultaneously yields ΔE for every possible single-spin flip — then samples a move from those N candidates. That's the operation that maps cleanly onto custom silicon: a parallel matrix-vector product over short integers, exactly what the Edge TPU, Extropic's TSU, D-Wave's QPU, and Fujitsu's Digital Annealer all accelerate. Swap in real Ising silicon and the rest of the pipeline is unchanged.

Where this matters today

Ising-class hardware (quantum annealers, p-bit machines, optical Ising machines, FPGA-based digital annealers) is being commercially deployed or seriously researched in:

Logistics and routing — vehicle routing, last-mile delivery, container loading, airline crew rostering. Volkswagen ran a public taxi-routing trial in Lisbon on a D-Wave; DENSO has done factory AGV routing; Recruit uses Fujitsu's Digital Annealer for warehouse picking-route optimization.
Drug discovery and protein folding — conformation search reduces to finding low-energy states in a discrete configuration space. Menten AI, ProteinQure, and Polaris Quantum Biotech have published QUBO-formulated drug-target work.
Portfolio optimization in finance — Markowitz portfolio selection with cardinality constraints is a hard combinatorial problem that maps to QUBO. Multiverse Computing, Goldman Sachs's quant research group, and Mitsubishi UFJ have run Ising-backed portfolio experiments.
Lattice problems in cryptography — shortest vector / closest vector problems in lattices (the foundation of post-quantum crypto) have natural QUBO encodings. Used both offensively (analyzing PQC schemes' hardness) and defensively (validating parameter choices).
Graph problems — max-cut, graph coloring, community detection in social networks, chip floor-planning (graph partitioning + placement), VLSI routing, FPGA congestion. EDA tooling experiments at Synopsys and Cadence.
Machine learning — restricted Boltzmann machines were the historical motivator. Sparse-network training, hyperparameter search, and feature selection have all been formulated as QUBOs. Extropic's TSU and the resurgent stochastic-computing field target this directly.
Materials science and chemistry — computing ground states of physical Hamiltonians (the original use case — these machines are simulating the systems they're inspired by). Lattice gauge theory, frustrated magnets, high-Tc superconductor candidates.
Operations research generally — scheduling (job-shop, project, resource-constrained), set cover, knapsack variants, MAX-SAT — all reducible to QUBO. NEC, Hitachi, and Toshiba sell annealer-as-a-service products to enterprise OR teams.

Where this could matter

The ceiling is much higher than current deployment because Ising-class hardware is still 5–10 years from commodity. Credible directions:

Real-time embedded optimization. When the annealer is a small chip rather than a cloud service, you can put one in a robot, a drone, an autonomous vehicle, an AR/VR headset, a power-grid controller. Low-power, deterministic-latency optimization at the edge.
Probabilistic AI accelerators. Sampling — drawing from learned distributions — is the bottleneck in diffusion models, Bayesian inference, and the kind of probabilistic reasoning that LLMs currently approximate token-by-token. A p-bit chip could do this natively.
Combinatorial co-processors. The way GPUs became the de-facto matmul co-processor for ML, p-bit / Ising chips could become the de-facto combinatorial co-processor for scheduling, planning, and discrete decision-making in conventional software stacks.
Climate and energy. Power-grid dispatch, EV charging coordination, district-heating scheduling, supply-chain decarbonization — all combinatorial, all increasingly time-critical.
Personalized medicine. Treatment-plan optimization (which combination of drugs, dosages, timing) under patient-specific constraints is QUBO- shaped and currently solved with heuristics that don't scale.

What lopt is, in that context

A hands-on emulator that demonstrates the full pipeline end-to-end on $40 of hobbyist parts: real thermal-noise harvesting, real von-Neumann debiasing, a real Metropolis annealer solving a real (small) optimization problem, and a real heterogeneous-compute architecture where the matmul step is offloaded to an accelerator over TCP. Swap in actual Ising silicon (Extropic, D-Wave, Fujitsu DA) and the host code is unchanged — it's a backend change.

The point isn't to compete with industrial annealers. It's to make the architecture legible by building it yourself.

Status

Working today:

Hardware: TMP36, MMA8452Q, RGB LED, button, pot — all wired on a SparkFun RedBoard.
Bridge: multi-channel entropy from the TMP36 LSB (thermal/electronic noise) plus three MMA8452Q raw-int LSBs (mechanical noise — Brownian motion of the MEMS proof mass). On healthy hardware, marginal H ≈ 1.0 bits/sample on each accel axis; the mechanical noise floor dominates the thermal one at room temperature. See docs/NIST_SP800_22.md for the detailed per-channel breakdown and the live-hardware autocorrelation finding.
DRBG layer (orchestrator/drbg.py): HMAC_DRBG SHA-256 (NIST SP 800-90A, stdlib-only, CAVP-vector-verified) seeded from the bridge. Applications consume application-rate bytes while the seed retains its hardware physical-provenance guarantee. make seed / make password use it directly; parallel_ising / tsp / ising solvers use it by default (--no-drbg for raw-bridge testing).
p-bit emulator with live knob bias and LED feedback.
Annealing demos: Ising spin-glass (ising.py), TSP via QUBO (tsp.py), and Multi-Try Metropolis (parallel_ising.py) — each with brute-force optimum + side-by-side python-vs-bridge comparison plots.

Coral Dev Board running Mendel, on the network, Edge TPU detected by PyCoral. All three heterogeneous backends shipped behind the same wire protocol — the host annealer offloads J @ s matmuls to a TCP server on the Coral. Backend chosen at server startup. All three produce bit-identical output.

Measured per-matmul compute time across problem size N (median of 200 matmuls, on-Coral, network excluded):

N	V1 NumPy	V2 TFLite-CPU	V3 Edge TPU (spatial)	V3 / V2	Winner
32	0.030 ms	0.048 ms	0.411 ms	0.12×	V1
64	0.046 ms	0.036 ms	0.414 ms	0.09×	V2
128	0.105 ms	0.052 ms	0.444 ms	0.12×	V2
256	0.339 ms	0.119 ms	0.462 ms	0.26×	V2
512	1.36 ms	0.665 ms	0.493 ms	1.35×	V3
1024	5.45 ms	3.92 ms	0.528 ms	7.43×	V3
2048	22.08 ms	20.20 ms	0.700 ms	28.9×	V3

Three regimes visible in the curves:

Small N (≤256): the per-step compute is so cheap that any framework overhead dominates. V1 NumPy wins at N=32 because it has no framework — it's just a plain matmul. V2 wins at N=64–256.
Crossover at N≈512: V3 overtakes V2. The Edge TPU's flat ~0.5 ms overhead stops dominating once the matmul itself is large enough.
Large N (≥1024): V3 dominates by a widening margin. V3 scales near-flat with N (the matmul fits in on-chip SRAM and bandwidth doesn't bottleneck); V2 scales O(N²) on the CPU. By N=2048 the gap is 29×.

Full benchmark JSON in docs/scaling_spatial.json; reproduce with make coral-bench after deploying spatial-variant models (see docs/RESEARCH_LOG.md).

Acceleration history. Until 2026-05-11 the V3 line was 4–8× slower than what's shown above: edgetpu_compiler v16 was leaving the Conv2D matmul on the ARM CPU and only mapping the surrounding Reshape ops. The "spatial" variant in build_tflite.py reshapes s into an HxWxC feature map (H,W ≤ 16) instead of the degenerate (1,1,N) tensor the old "pointwise" variant used. With a real HxW convolution to compile, the partitioner places the entire graph on the Edge TPU — single subgraph, all on-chip. See docs/RESEARCH_LOG.md for the full story.

End-to-end annealing wall-clock. Matmul-ms is what the silicon cares about; ms/Metropolis-step is what a user cares about. make anneal-bench runs full MTM annealing on the workstation host with each backend wired in, and times the whole loop including network round-trip, J caching, accept/reject overhead. Median ms/step at 200 steps:

N	Host NumPy	Coral V1	Coral V2	Coral V3 (spatial)
32	0.009	56.3	55.1	56.1
128	0.012	56.4	55.5	55.6
512	0.087	76.2	64.3	64.1
1024	0.315	121.5	90.7	73.6
2048	1.220	103.7	88.4	84.2

Two takeaways, both honest:

A workstation host beats the Coral by ~233× end-to-end at N=1024 (0.32 ms/step vs 73.6 ms/step). The heterogeneous-compute story only makes sense when the host is itself slow — a microcontroller, a sensor node, an embedded SoC. On a Mac, the M-series CPU dwarfs anything the Coral can offer. The lopt architecture is built for the embedded case (RedBoard host, Coral accelerator); the workstation numbers exist as the upper bound.
Network round-trip dominates compute below N=512. The SSH-tunneled wire takes ~55 ms regardless of backend, so V1/V2/V3 all collapse to a flat ~55 ms floor at small N. Above that the matmul-only ranking re-emerges: V3 takes the lead at N=512 and extends through N≥1024 — the same crossovers as the matmul-only plot, shifted up by network overhead (see the dotted V3-matmul-only reference line). Closing the gap with the dotted line is a question of moving J upload + matmul to a leaner protocol (gRPC, shared- memory, or batching k spin vectors per round-trip). The spatial-conv V3 update halved the V3 wall-clock at N=1024 (142.6 → 73.6 ms/step) by collapsing the matmul portion from 2.13 → 0.53 ms — the network floor is still there, but the compute on top of it is now flat.

A quantization-noise dividend. V3's int8 matmul introduces small approximation errors at the matmul boundary. In a deterministic linear algebra context that's a bug; in a stochastic optimizer it can be a feature. At N=512 V3 reached e_best=−5036 vs V1/V2's −5030; at N=1024 V3 reached e_best=−10799 vs V1/V2's −10637 — a 1.5% better solution. The quantization acts as an extra dithering term on top of the thermal noise, occasionally letting MTM escape a local minimum the bit-identical V1/V2 trajectories settle into. We're not claiming this is reliable across instances (n=4 paired runs is not statistics), but it's a striking-enough effect to flag — quantized accelerators may be especially well-suited to stochastic Ising solvers for reasons that have nothing to do with throughput.

Reproduce with make anneal-bench (assumes make coral-tunnel is up).

Live matplotlib dashboards (make sim-dashboard, make sim-tsp-dashboard) with flip-flash highlights, new-best stars, rolling-window acceptance strips, and an anneal-progress gauge.
Recording mode (make record-all) renders the dashboards to GIF (and MP4 if ffmpeg is on PATH) for sharing.
NIST SP 800-22 conformance harness (tools/nist/, vendored sts-2.1.2 with a non-interactive driver). Parallel runner shards the capture across cpu_count - 1 workers. On simulator bytes, all 15 tests pass at the proportion threshold across 800 streams × 1 Mbit (mean p-value 0.4999, lowest pass-rate 98.62% on Linear Complexity vs the 97.94% NIST threshold). Live-hardware capture in progress — see "Live capture infrastructure" below. Writeup in docs/NIST_SP800_22.md.
NIST SP 800-90B min-entropy estimator (orchestrator/min_entropy.py). Implements three of the ten 90B non-i.i.d. estimators — MCV (§6.3.1), binary Markov (§6.3.3), Compression / Maurer (§6.3.4). Per-channel run on healthy hardware: each accelerometer axis carries H_min = 0.94 bits/sample (94% of theoretical max); the TMP36 LSB sits at 0.15. The seven 90B estimators not yet implemented can only lower the reported H_min further, so the result is a conservative subset of full 90B compliance, not a certification claim. make min-entropy.

Live-hardware capture infrastructure for the multi-day SP 800-22 hardware run. Three layers of self-healing:

Layer	Role	When it kicks in
L1 — `nist_runner.py --capture-only`	streams clean bytes to disk	always
L2 — `lopt_coral_capture.sh` watchdog	restarts L1 on rate drop or process death	every 2 min
L3 — `lopt_coral_supervisor.sh` (cron)	restarts L1+L2 if both died (e.g. Coral reboot)	every 5 min

Plus firmware v1.4: when the I²C bus shorts to GND or accel.begin() fails, the firmware retries every 30s and emits a "# accel recovered" log line on transitions down→up. The MMA8452Q breakout has shown intermittent solder-bridge issues on this specific hardware; this layer papers over them without manual intervention. make coral-nist-deploy && make coral-nist-supervisor-install && make coral-nist-capture.

46 unit tests, all hardware-free. Includes a NIST CAVP test vector for HMAC_DRBG SHA-256 — the implementation passes the reference output bit-for-bit.

What's in the box

firmware/entropy_streamer/ — RedBoard firmware. Streams a 7-channel CSV over USB serial and accepts LED r g b commands from the host. I²C bus health checks, accel address fallback, button debouncing.
orchestrator/verify_stream.py — sanity-checks the stream and reports per-channel entropy quality.
orchestrator/entropy_bridge.py — von-Neumann debiasing to extract uniform random bits, p-bit sampling driven by the live knob position, RGB LED state feedback. SimulatedBridge produces realistic synthetic samples so the whole pipeline runs without hardware.
orchestrator/ising.py — Metropolis-Hastings annealer for a random Ising spin-glass; consumes clean bits from entropy_bridge, knob = annealing temperature, button = re-anneal spark, LED = state. Brute-force ground state at startup so you can see how close the annealing got.
orchestrator/tsp.py — Travelling Salesperson encoded as a QUBO Hamiltonian and solved by Metropolis annealing on the bridge's bits. Brute-force optimal tour at startup. Includes --compare mode (python RNG vs bridge bits, side-by-side plot) and --live mode (animated city map with current and best tours).
orchestrator/parallel_ising.py — single-spin SA vs Multi-Try Metropolis on a random spin-glass. MTM uses one matmul per step to evaluate ΔE for all N possible single-spin flips simultaneously. With --backend coral, the matmul ships to the Coral over TCP. --live opens an animated matplotlib window with energy trace, spin-grid heatmaps, acceptance strip, and anneal gauge.
coral/coral_server.py — runs on the Coral Dev Board. Listens on TCP port 5005, accepts (J, s) and returns J @ s. Three pluggable backends behind one wire protocol: NumpyBackend (V1, default), TFLiteBackend (V2, --model PATH), and Edge TPU (V3, --model PATH --edgetpu). The wire J is verified against the baked-in J on first load — deployment mistakes (wrong seed, stale model) print a warning.
coral/coral_client.py — host-side client for parallel_ising.py's --backend coral. Caches J on the server (uploaded once per problem instance), then ships only s per Metropolis step.
coral/build_tflite.py — builds the V2/V3 .tflite model with J baked in as a 1×1 Conv2D kernel. Float32 (--quantize none, default) for V2; int8 with adversarial calibration (--quantize int8) for V3.
coral/Dockerfile.tflite and coral/Dockerfile.edgetpu — Python 3.11
- TF 2.18 build environment, and a Debian-bullseye image with the Coral apt-installed edgetpu_compiler. Used by make tflite-build and make edgetpu-compile respectively.
tools/coral_bench.py — runs on the Coral, times all three backends (NumPy / TFLite-CPU / Edge TPU) at N ∈ {32 … 1024}, writes results to JSON. make coral-bench deploys, runs, and pulls the JSON back for plotting.
orchestrator/drbg.py — HMAC_DRBG SHA-256 (NIST SP 800-90A §10.1.2), stdlib-only, CAVP-test-vector-verified bit-for-bit. BridgeDRBG wraps any entropy source (live or simulator) and auto-reseeds every 1 MiB. Used by parallel_ising / tsp / ising (default) and by make seed / make password.
orchestrator/min_entropy.py — three SP 800-90B §6.3 estimators (MCV, binary Markov, Compression / Maurer). make min-entropy for per-channel live measurement; make sim-min-entropy for the simulator. Reports min over enabled estimators per NIST convention.
tools/nist_runner.py — drives the vendored tools/nist/ C suite (sts-2.1.2 with a non-interactive driver). Three modes: live capture
- test in one shot (make nist), capture-only (--capture-only PATH, for unattended multi-day runs), and test-only (--bin PATH, for running on a pre-captured .bin). Splits work across worker processes.
tools/calibrate_bridge.py — measures the live RedBoard's actual clean-bits/s rate over a fixed window, projects expected capture sizes for various wall-clock durations. make calibrate-bridge.
tools/lopt_coral_capture.sh — backgrounded launcher with built-in rate-watchdog (Layer 2). Runs on the Coral.
tools/lopt_coral_supervisor.sh — cron-installed restart layer (Layer 3). Runs on the Coral every 5 min. Restarts L1+L2 if both died (typical cause: Coral reboot from a USB power transient).

Hardware

See docs/WIRING.md for pin map and breadboard diagrams.

Part	Used as
SparkFun RedBoard	Primary entropy MCU
TMP36 temp sensor	Thermal entropy source
10 kΩ potentiometer	Live bias-field control
MMA8452Q breakout (3.3V)	Kinetic entropy / shake spark
Common-cathode RGB LED + 3× 220Ω	Live state visualization
Momentary push button	Manual re-anneal trigger
Adafruit KB2040	Future: 2nd entropy channel
Google Coral Dev Board	Heterogeneous matmul backend

Quick start

No hardware? Try the simulator

The bridge ships with a --simulate mode that synthesizes statistically realistic sensor samples so you can play with the p-bit emulator and the Ising annealer before wiring anything:

python3 -m venv .venv
.venv/bin/pip install pyserial numpy matplotlib pillow

make sim                # p-bit emulator on the simulator
make sim-ising          # Ising annealer on the simulator
make sim-tsp            # 5-city TSP via QUBO annealing on the simulator
make sim-tsp-compare    # same TSP, both python RNG and bridge bits + plot
make sim-dashboard      # live Ising dashboard (energy + spin grids)
make sim-tsp-dashboard  # live TSP dashboard (city map + tour + energy)
make sim-seed           # 32 simulator-seeded HMAC_DRBG bytes (hex)
make sim-password       # 24-char simulator-seeded password (a-zA-Z0-9)
make sim-min-entropy    # per-channel SP 800-90B min-entropy on the simulator
make nist-quick         # 12.5 MB NIST 800-22 run on simulator bytes (~30s)
make nist               # full 100 MB NIST run on simulator bytes (~2 min)
make test               # run the unit tests (no hardware needed)

Recording dashboards

make record-dashboard   # render docs/dashboard.gif (Ising)
make record-tsp         # render docs/tsp.gif (TSP)
make record-all         # both

GIFs default to ~2.5 MB at 18 fps / 80 dpi. For higher-quality MP4s, install ffmpeg (brew install ffmpeg) and pass --save-mp4 path.mp4 to either dashboard module directly.

With the hardware

brew install arduino-cli
arduino-cli core install arduino:avr
arduino-cli lib install "SparkFun MMA8452Q Accelerometer"

make upload           # flash the RedBoard
make verify           # capture 500 samples, print per-channel entropy stats
make pbit             # p-bit emulator (turn the knob, watch the LED)
make pbit             #   — add `--drbg` for memory-speed pbits
make ising            # Ising-glass annealer (knob = anneal temperature)
make tsp              # 5-city TSP via QUBO on physical noise
make seed             # 32 hardware-seeded HMAC_DRBG bytes (hex)
make password         # 24-char hardware-seeded password
make min-entropy      # per-channel SP 800-90B min-entropy, 60-second window
make calibrate-bridge # 5-min run, projects expected capture sizes

Healthy make verify output (firmware v1.4, MMA8452Q in good order):

# lopt entropy_streamer v1.4 accel=1
# rows=500 window=4.99s rate=100.2Hz
# accel_g   |a|_mean=1.005   (gravity, sensor live)
# temp       p(1)=0.50  H=0.95 bits/sample  VN_yield=46.6%
# ax_raw     p(1)=0.48  H=1.00 bits/sample  VN_yield=49.9%
# ay_raw     p(1)=0.53  H=0.99 bits/sample  VN_yield=49.8%
# az_raw     p(1)=0.51  H=1.00 bits/sample  VN_yield=49.9%
# combined: ~196 clean bits/s (theoretical, marginal-derived)

The "combined" line is the i.i.d. ceiling. Live yield runs lower because of sample-to-sample autocorrelation — 78–80 clean bits/s on healthy hardware. See docs/NIST_SP800_22.md for the autocorrelation breakdown and the per-channel SP 800-90B numbers.

Multi-day live-hardware NIST capture

Run on the Coral so the laptop's free, with a three-layer self-healing supervisor:

# RedBoard's USB plugged into the Coral's USB-A port, /dev/ttyUSB0.
# `coral` resolves via /etc/hosts or DNS. mendel must be in dialout group.

make coral-nist-deploy             # one-time: push entropy_bridge / nist_runner
                                   #           / launcher / supervisor to Coral
make coral-nist-supervisor-install # */5 cron — relaunches stack on Coral reboot
make coral-nist-capture            # NIST_BYTES=1500000 (= 12 streams), ~38h

# any time:
make coral-nist-status             # pid + log tail + .bin size
make coral-nist-pull               # scp the .bin home to ./capture.bin
make nist-test NIST_BIN=capture.bin  # run the 800-22 C suite

With the Coral (heterogeneous compute)

The host annealer offloads J @ s matmuls to a NumPy-backed TCP server on the Coral. Works with simulator entropy too — no RedBoard needed for this demo.

# one-time: ensure `coral` resolves (e.g. /etc/hosts entry to the Coral's IP)
make coral-deploy            # scp coral_server.py to ~mendel/

# in one shell:
make coral-server            # starts the matmul server on the Coral

# in another shell:
make coral-parallel-ising    # MTM annealer, n=32, 500 steps, matmul on Coral

macOS gotcha. On macOS Sequoia/Tahoe, Homebrew Python doesn't have the Local Network privacy entitlement, so direct TCP connections to RFC1918 IPs get rejected with OSError: [Errno 65] No route to host even though ssh and nc work fine. Workaround — tunnel the port over SSH so Python connects to loopback (which isn't gated):

make coral-tunnel                                    # 127.0.0.1:5005 -> coral:5005
.venv/bin/python -m orchestrator.parallel_ising \
    --simulate --sim-seed 11 -n 32 --steps 500 --algorithm mtm \
    --backend coral --coral-host 127.0.0.1
make coral-tunnel-down

The tunnel adds ~50ms per round-trip vs ~1–2ms on direct LAN; the matmul itself is ~0.1ms on the Coral's ARM CPU at n=32.

How the p-bit works

Four pieces:

Entropy sources. Four channels per sample at 100 Hz: TMP36 LSB (thermal/electronic, low entropy at room temperature) plus the raw 12-bit MMA8452Q register LSBs on each of three axes (mechanical noise, ~0.94 bits H_min/sample on each axis under SP 800-90B).
Whitening. Per-channel von-Neumann debiasing. (0,1) → 0, (1,0) → 1, drop matches. Output is provably uniform given i.i.d. input. Per-channel pairing matters — cross-channel pairs would give non-uniform output because temp's p is wildly different from accel's.
Conditioning + expansion. HMAC_DRBG SHA-256 (NIST SP 800-90A §10.1.2). Pulls 64 bytes of seed material from the bridge once, then expands to memory-rate output via HMAC chains. Reseeds every 1 MiB.
Biased sampling. Knob → sigmoid p1. Sample bit = 1 if u01 < p1 else 0 where u01 comes from 16 DRBG-output bits.

Result: knob left → mostly 0s, knob right → mostly 1s, knob centered → fair coin. The first byte's lineage traces all the way back to Brownian motion of a MEMS proof mass; subsequent bytes are HMAC-SHA256 expansions of that seed. make pbit --drbg runs at memory speed; make pbit (raw) runs at the bridge's ~10 B/s rate.

Future work

The deferred items below have a clear path. Each is independent — pick any order.

V3 acceleration gap — closed (2026-05-11)

Previously, edgetpu_compiler v16 left the Conv2D matmul on the ARM CPU and only mapped the surrounding Reshape ops. The fix was a model reshape: lay s out as an HxWxC feature map (H,W ≤ 16) instead of a degenerate (1,1,N) tensor, so the compiler sees a real spatial convolution. Single subgraph, full on-chip placement at every N from 32 to 2048. See the benchmark table in the Three-backend stack section above and docs/RESEARCH_LOG.md for the experimental narrative.

Build chain:

make tflite-image                                   # one-time
make tflite-build-int8 TFLITE_VARIANT=spatial \
    TFLITE_N=256                                    # build N=256 int8
make edgetpu-image                                  # one-time
make edgetpu-compile TFLITE_N=256                   # compile for TPU

edgetpu_compiler is x86_64 Linux only; on Apple Silicon it runs under Rosetta emulation (--platform linux/amd64).

The pointwise variant is preserved (TFLITE_VARIANT=pointwise) for reproducing the historical baseline.

KB2040 second-channel firmware

The KB2040 is a 133 MHz RP2040 currently unused. With a second TMP36 wired to it, it can sustain a parallel ~1 kHz entropy stream on its own USB serial port. The host orchestrator opens both ports, XORs the streams together, and feeds the merged result to the existing von-Neumann debiaser. Result: 10× the bandwidth, and provably better uniformity (the Piling-up Lemma reduces bias quadratically when XORing independent streams).

Sketch in docs/WIRING.md §6. Roughly 60 lines of Arduino sketch on the KB2040 side and a small XorBridge class on the host side.

Full NIST SP 800-90B compliance

Three of the ten 90B non-i.i.d. estimators are implemented (MCV, binary Markov, Compression / Maurer). The reported H_min on healthy hardware (0.94 bits/sample on each accel axis) is the min over those three; the seven not yet implemented (collision, t-tuple, LRS, multi-MCW family, lag, LZ78Y) can only ever lower H_min further. Adding them would either confirm the conservative bound or expose a structure the current subset misses. Plus IID-vs-non-IID detection per §3.1.2 to formally classify the source. NIST has reference C implementations; porting takes a focused day or two.

Live-hardware NIST SP 800-22 result

The 800-stream / 100 MB pass is on simulator bytes; the live capture on the Coral is in flight at the time of writing this. When it completes (~1.8 days, multi-day target), drop the result into docs/NIST_SP800_22.md caveat 1 and remove the hedge. The work to set this up — capture/test split, three-layer supervisor, calibration tool — is shipped; we just need the bytes.

Closing the V3 acceleration gap further

The V1/V2/V3 crossover plot above (V3 wins at N=1024) is achieved with the current partial Edge TPU mapping where only the Reshape ops run on silicon and the Conv2D matmul falls back to ARM CPU. Getting the Conv2D onto the silicon should push the V3 line down by another order of magnitude and move the crossover to roughly N=256. See "V3 acceleration gap" above for the avenues.

Push to N ≥ 4096

The current bench tops out at N=1024 because building the float32 .tflite for N=4096 would be ~64 MB and the on-chip Edge TPU memory caps the model size that fits without streaming. A few options to explore: enabling the off-chip parameter streaming path, sharding the matmul into blocks, or using int8-only weights throughout (which we already do for V3) so the model footprint scales as N² bytes rather than 4N² bytes.

License

MIT — see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

lopt

Headline numbers

See it run

Why this is interesting

Where this matters today

Where this could matter

What lopt is, in that context

Status

What's in the box

Hardware

Quick start

No hardware? Try the simulator

Recording dashboards

With the hardware

Multi-day live-hardware NIST capture

With the Coral (heterogeneous compute)

How the p-bit works

Future work

V3 acceleration gap — closed (2026-05-11)

KB2040 second-channel firmware

Full NIST SP 800-90B compliance

Live-hardware NIST SP 800-22 result

Closing the V3 acceleration gap further

Push to N ≥ 4096

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
coral		coral
docs		docs
firmware/entropy_streamer		firmware/entropy_streamer
lopt		lopt
orchestrator		orchestrator
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

lopt

Headline numbers

See it run

Why this is interesting

Where this matters today

Where this could matter

What lopt is, in that context

Status

What's in the box

Hardware

Quick start

No hardware? Try the simulator

Recording dashboards

With the hardware

Multi-day live-hardware NIST capture

With the Coral (heterogeneous compute)

How the p-bit works

Future work

V3 acceleration gap — closed (2026-05-11)

KB2040 second-channel firmware

Full NIST SP 800-90B compliance

Live-hardware NIST SP 800-22 result

Closing the V3 acceleration gap further

Push to N ≥ 4096

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages