Skip to content

abailey81/implicit-interaction-intelligence

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

221 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

I³ — Implicit Interaction Intelligence

Python 3.12 PyTorch 2.x License: MIT Code: from-scratch No HuggingFace FastAPI ONNX Live demo

Implicit Interaction Intelligence — an on-device assistant that reads how you type, then answers accordingly. Built from scratch in PyTorch on a single 6 GB laptop GPU. ~200 MB int8. No cloud round-trip required.

A 204 M-parameter custom transformer (MoE-FFN, ACT halting, per-layer cross-attention conditioning), a from-scratch byte-level BPE tokenizer (217 LOC), a TCN encoder for keystroke dynamics, a LinUCB contextual bandit for edge/cloud routing, and a closed-loop adaptation pipeline that observes a user, infers an 8-axis adaptation vector, and biases generation accordingly — all in a single repository, with zero HuggingFace dependencies in the SLM generation path.

Three-arm cascade with a scored multi-signal smart router — local SLM + retrieval gets the first shot on every chat turn; Qwen 1.7 B + LoRA handles HMI commands (set_timer, play_music, navigate, …) with deterministic JSON output; Gemini 2.5 Flash "tags in" only when the local arms genuinely can't ground the query. Every reply ships a route_decision chip showing which arm answered, the per-arm confidence scores, and the routing reason in a hover tooltip.

Real actuatorsset timer for 30 seconds schedules an asyncio task that fires a notification 30 s later (gold pulse banner in the chat); navigate to trafalgar square shows an immediate route banner; cancel tears down all pending timers. The cascade isn't just acks — it actually does things.

Edge inference, demonstrable live. The TCN encoder is INT8-quantised to 162 KB ONNX and runs in your browser tab via ONNX Runtime Web (WASM / WebGPU). Toggle is in the State tab → Edge inference. Open Chrome DevTools → Network and you see zero /api/encode requests when ON — keystrokes never leave the page.

Live demo · 30-min demo cheat sheet · Direct response to recruiter pre-screen · HCI design brief · Open problems / what I'd hand a teammate · Technical report

Run I3_PRELOAD_QWEN=1 python -m uvicorn server.app:app --host 127.0.0.1 --port 8000, then visit http://127.0.0.1:8000 — the SPA opens straight on the Chat tab with five suggestion chips that exercise every cascade arm in five clicks.


Table of contents

  1. Why this project exists
  2. What makes this different
  3. Architecture at a glance
  4. From-scratch components
  5. Run it locally
  6. The demo UI — what each tab shows
  7. Live measurements
  8. Privacy story
  9. Repository layout
  10. What I would ship next
  11. Credits & references
  12. Huawei filter questions — direct map

Quick links — Huawei R&D UK HMI Internship pitch (60-second skim)

If you arrived here via the recruiter / via a link in my application, these are the documents to read in priority order:

If you have Read
60 seconds HUAWEI_PITCH.md — the full feature surface + 4-recruiter-Q answers
5 minutes + docs/huawei/jd_to_repo_map.md — every JD bullet → file:line
15 minutes + docs/huawei/feature_matrix.md — I³ vs Apple Intelligence / Pixel AI / Galaxy AI / Pangu / Qwen3.5 / DeepSeek-V4 / Gemma-4 / Kimi-K2.6
30 minutes + docs/huawei/design_brief.md (persona + interaction principle) and docs/huawei/finetune_artefact.md (Qwen + Gemini fine-tune side-by-side)
60 minutes + docs/huawei/iteration_log.md (51-iter trajectory), docs/huawei/iter52_plus_roadmap.md (iter 52-138 sweep + 86 commits), docs/huawei/research_reading_list.md (15 papers 2024-2026), docs/huawei/forward_roadmap.md (what I'd build next at HMI Lab)
Onboarding (1 day) docs/huawei/onboarding_a_teammate.md + docs/huawei/recruiter_clarification_answers.md
The paper docs/paper/I3_research_paper.md — 461 LOC with §6 Results (TCN clustering, SLM perplexity + KL ratio, router regret, Kirin 9000 → Smart Hanhan device-feasibility table)
Latest live numbers make test-iter640 / 640 + 1 skipped passed in 35.20 s. Trained Qwen LoRA: best_val_loss = 5.36 × 10⁻⁶; eval: action_accuracy = 100 %, full_match = 100 %, macro F1 = 1.0 on 253 examples (reports/iter_test_sweep.md, checkpoints/intent_eval/comparison_report.md).

Why this project exists

Most chatbot demos in 2026 are a thin wrapper over a foundation model. They are good at language. They are dead to how you typed it.

The HMI thesis behind this project: people generate a stream of implicit signals — inter-key intervals, edit ratios, voice prosody, gaze fixation — that already encode their cognitive load, affect, and intent. A useful assistant adapts its phrasing, length, vocabulary, and even its routing decisions to that signal, in real time, on device.

This is the demo of that thesis end-to-end. It is also the portfolio piece for the Huawei R&D UK HMI Lab AI/ML Specialist Internship application; the four "filter questions" the recruiter asked are mapped directly to artefacts at the bottom of this file.


What makes this different

Five claims, ranked by relevance to the Huawei JD.

1. The core ML is hand-implemented, not borrowed

The decoder transformer, the byte-level BPE tokenizer, the TCN encoder, the MoE-FFN with softmax gating, the ACT halting controller, the LinUCB contextual bandit, the BM25 reranker, the typing-biometric authenticator, the LoRA adapter, the per-layer cross-attention head — all written from the original papers in torch.nn.Module / numpy / pure Python. Zero HuggingFace transformers, zero sentence-transformers, zero tokenizers crate, zero pretrained weights downloaded.

2. The SLM is a real custom transformer, not a wrapper

AdaptiveTransformerV2 (i3/slm/adaptive_transformer_v2.py) is a 204 M-parameter decoder with d_model=768, 12 layers, 12 heads, d_ff=3072, MoE-FFN with 2 experts, ACT halting, per-layer cross- attention onto an 8-dim adaptation vector and 64-dim user-state embedding, trained with bf16 + 8-bit AdamW + gradient checkpointing on a 977,332-pair dialogue corpus collected from six public sources. No distillation from a foundation model. No "just fine-tune Phi-3." From scratch end to end.

3. The pipeline is a 14-stage closed loop you can watch fire

Every reply runs through a real DAG: keystroke capture → linguistic features → TCN encoder → user-state embedding → adaptation-vector inference → bandit route → retrieval (cosine + BM25 rerank) → SLM generation (or cloud LLM with PII sanitisation) → post-processor (surface + structural rewriting) → self-critique loop → coreference resolution → diary persist → telemetry. The Flow tab in the UI animates each stage on every reply with real timings; no synthetic animation.

4. It actually fits on edge

Measured (CPU, no GPU). The TCN encoder is shipped as 162.2 KB INT8 ONNX (web/models/encoder_int8.onnx) — −63 % vs the 441.4 KB FP32 export, parity MAE 0.00055 — and runs in-browser via onnxruntime-web (WebGPU + WASM fallback) at p50 460 µs (2 176 enc/s). The v2 SLM (204 M params) is profiled at ~200 MB INT8 budget; the v1 SLM (53 M params, kept as a draft-model candidate) measures 110.2 MB int8 / 612 ms p50 for a 32-prompt → 16-token greedy decode. Live numbers: reports/edge_profile_2026-04-28.md (latest); reports/edge_profile.md (older v1 baseline).

5. Privacy is a property of the architecture, not a policy

The diary database has no text column — only encrypted embeddings, topic-keyword lists, and scalar metrics. PII is stripped at every cloud boundary by i3/privacy/sanitizer.py. The biometric template is keyed by SHA-256, never stored as raw key events. Cloud calls are off by default; toggling them on shows a live privacy budget in the Privacy tab.


Architecture at a glance

┌─────────────────────── on-device pipeline (CPU or 6 GB GPU) ──────────────────────┐
│                                                                                    │
│   [keystroke + voice + gaze]                                                       │
│             │                                                                       │
│             ▼                                                                       │
│   ┌──────────────────────┐    ┌──────────────────────┐    ┌──────────────────────┐│
│   │ feature extractor    │───▶│  TCN encoder         │───▶│ 64-d user-state emb. ││
│   │ 32-dim sliding win   │    │  (dilated conv,      │    │ + 8-axis adaptation  ││
│   │ IKI · burst · dwell  │    │   metric-loss train) │    │   vector             ││
│   └──────────────────────┘    └──────────────────────┘    └──────────┬───────────┘│
│                                                                       │            │
│   ┌──────────────────────┐    ┌──────────────────────┐                │            │
│   │ retrieval            │    │ LinUCB bandit        │◀───────────────┘            │
│   │ cosine + BM25 rerank │───▶│ {edge SLM, cloud}    │                             │
│   └──────────────────────┘    └──────────┬───────────┘                             │
│                                          │                                          │
│                       ┌──────────────────┴───────────────────┐                     │
│                       ▼                                       ▼                     │
│   ┌──────────────────────────────────────┐   ┌─────────────────────────────────┐  │
│   │  AdaptiveTransformerV2 (custom)      │   │  cloud LLM (opt-in)             │  │
│   │  204 M params · MoE × 2 · ACT halt   │   │  PII sanitiser · privacy budget │  │
│   │  per-layer cross-attn on (state, A)  │   │  policy guarded · cost tracked  │  │
│   └────────────────┬─────────────────────┘   └────────────────┬────────────────┘  │
│                    │                                            │                   │
│                    ▼                                            ▼                   │
│   ┌────────────────────────────────────────────────────────────────────────────┐  │
│   │  post-processor (surface + structural rewriting)                           │  │
│   │  self-critique loop · coreference resolution · diary persist (encrypted)   │  │
│   └────────────────────────────────────────────────────────────────────────────┘  │
│                                          │                                          │
│                                          ▼                                          │
│                                    user reply                                       │
│                                                                                    │
└────────────────────────────────────────────────────────────────────────────────────┘

The same diagram is animated live in the Flow tab; every dot is a real stage with real timings.


From-scratch components

Each row links to the implementation file and the paper it's drawn from. None of these are imported from a library.

Component Paper File
Byte-level BPE tokenizer (32 k vocab, 217 LOC) Sennrich, Haddow & Birch 2015 i3/slm/bpe_tokenizer.py
AdaptiveTransformerV2 (204 M, MoE + ACT + per-layer cross-attn) Vaswani 2017; Xiong 2020 (pre-LN) i3/slm/adaptive_transformer_v2.py
MoE feed-forward with softmax gating Shazeer et al. 2017 i3/slm/moe_ffn.py
ACT halting controller (per-token ponder) Graves 2016 i3/slm/act_halting.py
Cross-attention conditioning + projector this work i3/slm/cross_attention.py
Multi-task auxiliary heads (biometrics / affect / reading-level) Caruana 1997 i3/slm/multi_task_heads.py
Streaming greedy/top-k generator this work i3/slm/generate.py
TCN keystroke encoder (dilated conv, metric-loss trained) Bai 2018 i3/encoder/tcn.py
Keystroke-dynamics features (IKI, burst, dwell-flight) Killourhy & Maxion 2009 i3/interaction/features.py
Linguistic features (Flesch–Kincaid, Gunning-Fog, TTR) Flesch 1948; Gunning 1952 i3/interaction/linguistic.py
Typing-biometric continuous auth Monrose & Rubin 1997 i3/biometric/keystroke_auth.py
LoRA adapters (per-user, biometric-keyed) Hu et al. 2021 i3/personalisation/lora_adapter.py
LinUCB contextual bandit (edge vs cloud) Li et al. 2010 i3/router/bandit.py
BM25 reranker (cosine + BM25 hybrid) Robertson 1994 i3/slm/retrieval.py
Multimodal fusion (keystroke + prosody + gaze) Atrey et al. 2010 i3/multimodal/fusion.py
Voice prosody features Schuller 2009; Eyben 2010 i3/multimodal/prosody.py
Gaze classifier (frozen MobileNetV3 backbone + new head) Howard et al. 2019 i3/multimodal/gaze_classifier.py
Self-critique loop Madaan et al. 2023 (Self-Refine) i3/critique/critic.py
Multi-turn coreference resolution rule-based, this work i3/dialogue/coref.py
PII sanitiser (regex + entropy heuristics) this work i3/privacy/sanitizer.py
Privacy budget per session Dwork & Roth 2014 (informal) i3/privacy/budget.py
14-stage pipeline orchestrator this work i3/pipeline/engine.py

Run it locally

Very quick start

Use this section the morning of the demo. Everything you need is here — start command, URL, reset command, "is it working?" checks. If anything is unclear, this section is broken; fix it.

1. Start the server (one command, copy-paste)

Windows PowerShell (this is what's on the demo laptop):

$env:I3_PRELOAD_QWEN="1"; .\.venv\Scripts\python.exe -m uvicorn server.app:app --host 127.0.0.1 --port 8000

Git Bash / Linux / macOS:

I3_PRELOAD_QWEN=1 ./.venv/Scripts/python.exe -m uvicorn server.app:app --host 127.0.0.1 --port 8000

The command takes 10–15 seconds to fully boot because Qwen3-1.7B is preloaded into memory. Wait until the terminal prints both:

Uvicorn running on http://127.0.0.1:8000
Application startup complete

Why .venv\Scripts\python.exe and not just python? The project's dependencies (uvicorn, torch, transformers, …) are installed inside .venv/, not in your system Python. If you run plain python you'll hit No module named uvicorn. Using the venv's python directly is foolproof on any fresh terminal.

If you'd rather activate the venv once per terminal:

.\.venv\Scripts\Activate.ps1
$env:I3_PRELOAD_QWEN="1"
python -m uvicorn server.app:app --host 127.0.0.1 --port 8000

After Activate.ps1 your prompt shows (.venv) at the front.

2. Open the UI

Open http://127.0.0.1:8000 in Chrome or Edge.

You should land on the Chat tab with five suggestion chips below the hero:

  • How do you adapt to me?
  • Set timer for 30 seconds
  • Tell me about Uzbekistan
  • What is photosynthesis?
  • Navigate to Trafalgar Square

These exercise every cascade arm in five clicks (see the demo flow in docs/huawei/PRESENTATION_SCRIPT.md).

3. Reset the Identity Lock (start the demo cold)

Run this in a second terminal once the server is up. It wipes the keystroke biometric template so Identity Lock starts at learning · 1/5.

curl -X POST http://127.0.0.1:8000/api/biometric/demo/reset

Then hard-refresh the browser (Ctrl+Shift+R) to bust the SPA cache.

4. Sanity-check before you click anything in the demo

  • ✓ Server terminal shows Application startup complete.
  • ✓ Browser at http://127.0.0.1:8000 shows the Chat tab with the five suggestion chips.
  • ✓ The nav bar shows the Identity-Lock pill on the right (text: learning · 1/5 after the reset, or unregistered if not).
  • ✓ The cloud-route consent toggle reads cloud · off (default).
  • ✓ If you toggle State tab → Edge inference, then send a message, the browser DevTools Network panel shows zero /api/encode requests.

5. If anything misbehaves

Symptom Fix
No module named uvicorn You used system Python. Use .\.venv\Scripts\python.exe instead.
Address already in use on port 8000 Another server is still running. Kill it: Get-NetTCPConnection -LocalPort 8000 | Select OwningProcess, then Stop-Process -Id <pid> -Force.
Chat tab is blank / no suggestion chips Hard-refresh (Ctrl+Shift+R).
Identity Lock stuck Re-run the reset curl above; refresh.
Anything else Ctrl+C in the server terminal, re-run the start command.

6. Numbers verifier (run before leaving the house tomorrow)

.\.venv\Scripts\python.exe scripts\verify_numbers.py

Should return ALL CLAIMS VERIFY OK (22 / 22 PASS). If any FAIL, fix the docs before the interview.


Prerequisites

  • Python 3.12 (3.10 / 3.11 also tested)
  • ~3 GB free disk for checkpoints + corpus
  • Optional: NVIDIA GPU with 6 GB+ VRAM for v2 SLM training (CPU fine for inference and the full demo)

One-command bring-up

git clone https://github.com/abailey81/implicit-interaction-intelligence.git
cd implicit-interaction-intelligence

# Fast path: install + seed checkpoints + serve on http://127.0.0.1:8000
make all-fast

# Full path: also retrains encoder + SLM, runs benchmarks + ONNX export
make all-full

Both targets call scripts/run_everything.py, the wave-scheduled orchestrator (21 stages, parallel within waves, DAG resume, Rich live UI). To inspect a stage list:

make all-list

Manual install

poetry install
poetry run uvicorn server.app:app --host 127.0.0.1 --port 8000

Open http://127.0.0.1:8000 in a Chromium-family browser (Chrome, Edge, Brave). The UI is a single SPA with 12 tabs.

Key endpoints

Route What it does
GET / Demo SPA
GET /api/docs OpenAPI / Swagger
WS /ws/{user_id} Streaming chat with token-level deltas
GET /api/edge/profile Live params / size / latency / RSS
GET /api/biometric/{user_id}/status Identity Lock state
POST /api/biometric/{user_id}/reset Clear typing template
POST /api/routing/cloud-consent/{user_id} Cloud consent toggle
GET /api/attention?compute=true&text=... Real attention map
GET /api/flow/last-trace 14-stage pipeline trace
GET /api/explain/last/{user_id} "Why this response?" narration

Reset between demos

# Clear biometric template + diary so the next demo starts cold
curl -X POST http://127.0.0.1:8000/api/biometric/demo/reset
rm -f data/diary.db

Configuration

All hyperparameters live in configs/default.yaml. The notable sections:

Section Purpose
interaction Sliding-window size, feature dimensionality (32-d)
encoder TCN architecture: hidden dims, dilations, kernel size
user_model EMA α coefficients, baseline warmup, deviation threshold
adaptation 8-axis adaptation-vector ranges and rules
router Bandit type, context dim, exploration α, cloud thresholds
slm v1 SLM hyperparameters (d_model=512, 8 layers, 30 k vocab)
slm_v2 v2 SLM hyperparameters (d_model=768, 12 layers, 32 k vocab, MoE × 2)
training_v2 bf16, 8-bit AdamW, grad checkpoint, MoE/ACT aux losses
cloud Provider, model id, max tokens, fallback chain
diary Path, size cap, encryption flag
privacy PII strip flag, never-store-raw-text, embedding encryption
profiling Target devices for the edge-profile harness
server Host, port, CORS allow-list, WebSocket ping interval

Override at runtime with I3_* environment variables; see i3/config.py.

Training the models

The corpus is assembled from six public dialogue sources:

Source Pairs License
Cornell Movie-Dialogs 206 k research-friendly
SQuAD 1.1 87 k CC BY-SA 4.0
DailyDialog (ACL Anthology mirror) 81 k non-commercial research
PersonaChat (convai2_fix_723.tgz) 128 k non-commercial research
WikiText-103 (Stephen Merity CDN) 242 k CC BY-SA 4.0
OpenSubtitles slice (OPUS bucket) 227 k redistribution-friendly
Curated overlay (HMI / agentic prompts) 6 k original
Total unique pairs 977 332

Counts are read directly from data/processed/dialogue/triples.json (grouped by source field).

Build it locally:

poetry run python -m training.prepare_dialogue \
    --output data/processed/dialogue/triples.json \
    --max-pairs 977332

Train the encoder:

poetry run python -m training.train_encoder \
    --epochs 10 \
    --batch-size 64 \
    --device auto

Train the v2 SLM (overnight on RTX 4050 Laptop, ~18-22 hours):

poetry run python -m i3.slm.train_v2 \
    --config configs/default.yaml \
    --epochs 2 \
    --corpus-subset 300000 \
    --bf16 --grad-checkpoint --optim adamw8bit

Both trainers write checkpoints to checkpoints/ and emit a JSON metrics file alongside the best-loss weights.


The demo UI

Twelve core tabs + seven Huawei-pitch tabs (iter 51). Each does one thing well.

Tab What it shows
Chat The actual chat. Live state badge, Identity Lock, biometric drift chip, "Why this response?" panel, streaming caret, side-channel chips for affect-shift, safety caveat, adaptation, intent.
Stack The from-scratch component inventory + 22-card subsystem map (every module, LOC, status, paper).
State 64-d user-state embedding projected to 2-D, plus a real attention heatmap extracted from the SLM.
Adaptation The 8-axis adaptation vector live (cognitive load, formality, accessibility, …).
Routing LinUCB scatter of the last 50 turns; X = complexity, Y = retrieval top score; cloud threshold dashed line.
Flow 14-stage pipeline animation with real timings, data shapes, and decisions.
Privacy Privacy budget per session; PII redaction counter; cloud consent.
Profile Per-user accumulating drift, recency stack, biometric template summary.
Edge Real edge profile — params, fp32/bf16/int8 size, p50/p95 latency, peak RSS, deployability table.
Design Architectural decisions — what we chose, what we considered, why it won, paper citation.
References Every paper cited in the codebase, with one-paragraph summary + file pointers.
About The four Huawei filter questions, with answers.
Huawei → Intent (iter 51) Live /api/intent round-trip — type a command, see the structured JSON parse + confidence bars.
Huawei → Edge profile (iter 51) Component-level latency / memory table from /api/profiling/report with the 100 ms HMI budget overlay.
Huawei → Fine-tune (iter 51) Side-by-side comparison of the on-device Qwen3-1.7B LoRA vs the Gemini 2.5 Flash AI Studio fine-tune.
Huawei → Personal facts (iter 51) Cross-session encrypted personal-fact store, populated live from the WS state_update.personal_facts field.
Huawei → Multimodal (iter 51) All 6 input channels (vision / gaze / voice / PPG-HRV / touch / keystroke) with LOC + status + file links.
Huawei → Research (iter 51) 15+ paper reading list with where I³ uses or extends each one.
Huawei → JD map (iter 51) Every JD bullet → exact file, class, function, or test that satisfies it.

Screenshots

The 19-tab layout is documented in docs/screenshots/README.md. The smoke-test script at scripts/ui_smoke_test.py drives Playwright through every tab and captures a PNG; run it after any UI change to refresh the gallery.


Live measurements

These numbers are emitted by scripts/measure_edge.py / i3/edge/profiler.py on the same machine that serves the demo. They update on every benchmark run.

v1 SLM (53 M, legacy — superseded by v2 at iter 51, kept as a draft-model candidate for speculative decoding)

Metric Value
Parameters 53,307,392
fp32 size 203.4 MB
bf16 size 101.7 MB
int8 size (dynamic) 110.2 MB
Greedy decode (32 → 16 tokens, CPU) p50 612.8 ms, p95 692.4 ms
TCN encoder (10 × 32 window, CPU) p50 3.68 ms, p95 4.71 ms
Peak process RSS 1311 MB

v2 SLM (204 M, MoE + ACT, default-loaded since iter 51 / 2026-04-25)

Metric Value
Parameters ~204 M (d_model=768, 12 layers, 12 heads, 2 experts)
Vocab 32 000 (byte-level BPE, from scratch)
Training corpus 977,332 unique pairs (Cornell + SQuAD + DailyDialog + PersonaChat + WikiText-103 + OpenSubtitles slice + curated overlay), 300 k subset trained on
Training shape bf16 + 8-bit AdamW + grad checkpoint on RTX 4050 Laptop (6.4 GB)
Best eval_loss 4.987 (perplexity ≈ 147) at step 18 000, response-token-only, same-distribution holdout — recorded in checkpoints/slm_v2/best_model.pt best_eval_loss field. Stress-test (full-corpus + history-token loss, n=500) lands at perplexity ≈ 1 725 — see reports/slm_v2_eval.md for the two-number framing.
Aux losses MoE load-balance (Shazeer 2017) + ACT ponder (Graves 2016), 0.01 each

Conversational coherence

A 110-scenario audit, scored on adaptation fidelity, factual support, and refusal correctness:

Scenario class Pass rate
Factual single-turn 99.1 %
Multi-turn with coreference 96.4 %
Adaptation fidelity (style/length matches vector) 94.5 %
Refusal on hostile prompts 100.0 %
Bad-rate overall 2.4 %

Privacy story

Property Implementation
Diary persists no raw text Schema in i3/diary/store.py — only embeddings, topic keywords, scalar metrics
Embeddings encrypted at rest Fernet (AES-128-CBC + HMAC-SHA-256) — i3/privacy/encryption.py
PII stripped before any cloud call Regex + entropy + named-entity heuristics in i3/privacy/sanitizer.py
Per-session privacy budget i3/privacy/budget.py — counts cloud calls + redactions
Biometric template never stored as keys SHA-256-keyed feature vectors only — i3/biometric/keystroke_auth.py
Microphone audio never leaves browser Prosody features extracted client-side; only derived features posted
Camera frames never leave browser Gaze classifier runs in-tab via onnxruntime-web
WebSocket Origin allow-listed CORS doesn't apply to WS upgrades; checked manually in server/websocket.py
Cloud route OFF by default Toggle in nav; consent flips /api/routing/cloud-consent/{user_id}
Per-user LoRA, never aggregated i3/personalisation/lora_adapter.py

Repository layout

implicit-interaction-intelligence/
├── i3/                        ← all the from-scratch ML
│   ├── slm/                   ← AdaptiveTransformerV2, MoE, ACT, BPE, retrieval, generate
│   ├── encoder/               ← TCN, ONNX export, quantisation
│   ├── interaction/           ← keystroke + linguistic features
│   ├── biometric/             ← keystroke continuous auth
│   ├── affect/                ← state classifier + shift detector
│   ├── adaptation/            ← 8-axis vector inference + post-processor
│   ├── personalisation/       ← LoRA adapters per biometric template
│   ├── multimodal/            ← prosody, gaze, accelerometer, fusion
│   ├── router/                ← LinUCB bandit, complexity estimator, sensitivity
│   ├── pipeline/              ← 14-stage engine + types + tracer
│   ├── critique/              ← self-refine critic
│   ├── dialogue/              ← multi-turn coref
│   ├── privacy/               ← sanitiser, budget, encryption
│   ├── diary/                 ← encrypted embedding store, summariser
│   ├── edge/                  ← profiler, runtime selection
│   ├── cloud/                 ← guarded LLM client (Anthropic / Google / Ollama)
│   └── … (eval, federated, fairness, observability, redteam, safety, …)
├── server/                    ← FastAPI app + WebSocket + per-feature routes
├── web/                       ← SPA (vanilla JS, 12 tabs, no framework)
├── training/                  ← dialogue prep, encoder train, SLM train
├── scripts/                   ← measure_edge, run_everything (orchestrator)
├── docs/
│   ├── TECHNICAL_REPORT.md    ← arXiv-style writeup
│   ├── INTERVIEW_DEMO.md      ← 5-minute live demo script
│   └── …
├── checkpoints/
│   ├── slm/                   ← v1 + v2 weights, BPE tokenizer
│   └── encoder/               ← TCN .pt + .onnx
├── reports/                   ← edge profile, verification, redteam, evaluation
├── configs/default.yaml       ← all hyperparameters (slm, slm_v2, training_v2, …)
├── tests/                     ← pytest suite
└── benchmarks/                ← latency, conditioning sensitivity

What I would ship next

The current demo is a complete vertical slice. The roadmap below is where I would invest engineering judgement next, in priority order.

  1. Federated personalisation. Today each device trains its own LoRA adapter against its biometric template. The natural next step is FedAvg on the LoRA deltas (not the base weights) so a fleet improves together without raw data ever leaving devices. Skeleton exists in i3/federated/; needs a server-side aggregator and a key-rotation policy.
  2. Real gaze training data. The gaze classifier today is a frozen MobileNetV3 backbone with a tiny calibration head trained on 4 target points per session. Replacing the calibration with a pre-trained gaze regressor (MPIIGaze / Gaze360) would lift the signal from "rough heatmap quadrant" to actual fixation vectors.
  3. Bigger SLM via Colab Pro / A100. The 204 M shape is what trains in 18-22 h on the user's RTX 4050 Laptop. The architecturally-ideal shape is d_model=960, n_layers=16, d_ff=3840, n_experts=2, seq=1024 (~400 M); this trains in 2-3 days on a single A100 and likely closes the perplexity gap with foundation small-models. The architecture itself doesn't change — only the shape and the corpus epoch count.
  4. Speculative decoding with the v1 model as draft. The v1 SLM (53 M, 110 MB int8) is the perfect draft model for v2 (204 M). Skeleton in i3/slm/speculative_decoding.py; needs a verification kernel and acceptance-rate telemetry. Should roughly halve p50 generation latency.
  5. Wearable runtime port. ONNX export already runs in browser; the next port is to a wearable-class runtime (Kirin A2 / RK3588 / ARM-NEON int8). The TCN encoder is already small enough; the SLM needs distillation to ~10-20 M for 50 MB int8 — approachable via ACT-aware width pruning + LoRA-targeted distillation.
  6. Dataset collection at scale. The corpus is six public dialogue sources stitched together; the long-term improvement is real keystroke data with affect labels. The capture path exists (web/js/voice_prosody.js, the biometric template), but the labelling pipeline hasn't been built.

Each of these has a clean entry point in the codebase. None of them require rewriting the core architecture, which is the entire point.


Testing & verification

The repository ships with three classes of automated checks:

  • Unit testspytest over the tests/ tree, covering feature extractors, tokenizer round-trips, attention shapes, bandit invariants, sanitiser regex, biometric verification.
  • Verification harness — a 44-check acceptance suite that exercises every API route, every WebSocket frame type, and the full pipeline end-to-end. Reports land in reports/verification/.
  • Red-team probesi3/redteam/ runs a battery of jailbreak, PII-extraction, and adversarial-style prompts against both the local SLM and the cloud route. Reports land in reports/redteam/.

Run everything:

make test          # unit tests
make verify        # acceptance harness
make redteam       # adversarial probes
make all-quality   # lint + typecheck + bandit + pip-audit + tests

Or via the orchestrator (parallel-within-wave):

poetry run python scripts/run_everything.py --only test verify redteam

Observability

Every turn produces a structured trace consumable by:

  • Flow tab — animated 14-stage pipeline with real timings.
  • /api/flow/last-trace — JSON dump of the most recent turn's pipeline trace (stage timings, data shapes, decisions made).
  • /api/explain/last/{user_id} — the "Why this response?" narration, generated by i3/explain/ from the trace plus the diary.
  • MLflow — training runs log to mlruns/ for offline review.

All telemetry is local-only; no external observability service is called.

Troubleshooting

Problem Fix
make all-fast fails on Windows with a pip error Run inside the project venv (poetry shell); the orchestrator now skips --user inside venvs (commit d9b19b0)
MkDocs build fails with mkdocs-autorefs resolution error mkdocs-autorefs and griffe are pinned (commit b7e2fb9); ensure poetry install ran with the docs extra
WebSocket disconnects after 30 s of inactivity Default websocket_ping_interval: 30 in configs/default.yaml; raise to 60 if your network is flaky
CUDA out of memory during v2 training Lower batch_size from 4 to 2 in training_v2; grad_accum_steps compensates
Server OOMs during training Run server with CUDA_VISIBLE_DEVICES="" to force CPU; retrieval is still <100 ms
Identity Lock stuck on unregistered curl -X POST http://127.0.0.1:8000/api/biometric/demo/reset and refresh the browser
Cloud toggle won't flip Privacy budget exhausted — restart the server, or raise the budget cap in configs/default.yaml
Microphone / camera permissions denied The rest of the demo still works without them; permissions can be re-granted via the browser site-settings

A WSL2 + Triton setup guide is in docs/operations/wsl2-setup.md (commit 5e0e1e2).


Credits & references

The full annotated bibliography is in the References tab of the demo UI and at docs/TECHNICAL_REPORT.md. The core papers:

  • Sennrich, Haddow & Birch (2015) — Neural Machine Translation of Rare Words with Subword Units — BPE.
  • Vaswani et al. (2017) — Attention Is All You Need — transformer.
  • Shazeer et al. (2017) — Outrageously Large Neural Networks: Sparsely-Gated MoE.
  • Graves (2016) — Adaptive Computation Time for Recurrent Neural Networks.
  • Hu et al. (2021) — LoRA: Low-Rank Adaptation of Large Language Models.
  • Houlsby et al. (2019) — Parameter-Efficient Transfer Learning for NLP — adapter modules.
  • Killourhy & Maxion (2009) — Comparing Anomaly-Detection Algorithms for Keystroke Dynamics.
  • Monrose & Rubin (1997) — Authentication via keystroke dynamics.
  • Schuller (2009) — INTERSPEECH Computational Paralinguistics Challenge.
  • Eyben et al. (2010) — openSMILE: The Munich versatile audio feature extractor.
  • Howard et al. (2019) — Searching for MobileNetV3.
  • Bai et al. (2018) — An Empirical Evaluation of Generic Convolutional and Recurrent Networks — TCN.
  • Robertson (1994) — Some simple effective approximations to the 2-Poisson model — BM25.
  • Picard (1997) — Affective Computing.
  • Sweller (1988) — Cognitive Load During Problem Solving.
  • Madaan et al. (2023) — Self-Refine: Iterative Refinement with Self-Feedback.
  • Li et al. (2010) — A Contextual-Bandit Approach to Personalized News Article Recommendation — LinUCB.

Huawei filter questions — direct map

The recruiter at Huawei R&D UK sent four follow-up questions after my CV. Each is a filter for a specific competency. Here is exactly where each is answered in this codebase.

Q1. "Custom ML models — beyond using existing libraries, have you implemented the core algorithms yourself?"

Every algorithm in the inference path is hand-implemented. See the from-scratch components table above for the 22-row inventory. Concrete starting points:

Q2. "SLMs without heavy open-source frameworks — have you explored building or modifying them?"

Yes — AdaptiveTransformerV2 is a pure-PyTorch nn.Module, zero HuggingFace dependencies in the inference path. Q/K/V projections, scaled-dot-product attention with causal mask, MoE-FFN with softmax gating, ACT halting controller, and per-layer cross-attention all written from the original papers. Tokenizer is 217 LOC of hand-written byte-level BPE. Training loop uses bnb.optim.AdamW8bit for memory but not for any modelling logic. Files: i3/slm/ (entire directory).

Q3. "Pipeline orchestration — comfortable building from architectural blueprints?"

Yes — two orchestrators ship in this repo. The runtime orchestrator is the 14-stage closed-loop pipeline at i3/pipeline/engine.py, animated live in the Flow tab. The build orchestrator is scripts/run_everything.py — a 21-stage wave-scheduled DAG with parallel-within-wave execution, artefact dependency tracking, --resume, and a Rich live UI. End-to-end run takes ~10 minutes on an RTX 4050 Laptop.

Q4. "Edge deployment — ever deployed to low-compute devices (wearables, IoT)?"

Yes — the demo is edge-first. TCN encoder exports to ONNX (0.4 MB); a browser-side inference path runs it via onnxruntime-web with WebGPU

  • WASM backend detection (web/js/browser_inference.js). Real edge profile — int8 size, p50/p95 latency, peak RSS — measured by scripts/measure_edge.py and surfaced live in the Edge tab. Server enforces wearable-scale caps: per-message byte cap, per-session message cap, sliding-window rate limit, bounded keystroke buffer.

Built on a single 6 GB laptop GPU. Designed to be quantised to 8-bit and shipped to a phone.

Built for Huawei R&D UK

About

Adaptive AI companion that builds a model of each user from implicit interaction signals — keystroke dynamics, linguistic complexity, temporal patterns — and continuously adapts its responses. Custom TCN + transformer + contextual bandit, built from scratch in PyTorch.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors