Personal self-hosted webapp for cataloguing physical objects and retrieving them by pointing your phone camera. Snap a photo → find what you catalogued. See Design.md for the full design doc.
- Visual search — snap or upload a photo and get ranked matches from your catalogue
- Text search — query by description, or combine image + text for better results
- Hierarchical groups — organise items into an arbitrary-depth folder tree; search scopes to a subtree
- Metro tile navigation — group browser with mosaic tiles and crossfading thumbnails
- Widgets — attach notes, counters, date reminders, and location coordinates to any item
- Multimodal embeddings — image and text live in the same embedding space (Gemini), enabling cross-modal retrieval
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
# edit .env and set GEMINI_API_KEY
./run.shOpen http://localhost:8080 on your phone (same LAN) or via Tailscale.
You need a Gemini API key (free tier is sufficient).
./deploy.shHandles first-time setup and incremental deploys: syncs files, installs system deps (libopenblas0, libopenjp2-7), sets up the venv, registers the systemd service, and restarts the app. Target: alessio@raspino.local:/home/alessio/visual-memory.
DB is backed up nightly to Google Drive via rclone (gdrive:backups/visual-memory/). Cron is registered automatically by deploy.sh.
| Layer | Choice |
|---|---|
| Backend | Python + FastAPI, single uvicorn worker |
| Frontend | Vue 3 via CDN ESM (no build step), custom CSS |
| Storage | SQLite (stdlib sqlite3), images on filesystem |
| AI | Gemini (gemini-3.1-flash-lite-preview for captions, gemini-embedding-2-preview for embeddings) |