Skip to content

AlessioTonioni/Visual_memory

Repository files navigation

Visual Memory

Personal self-hosted webapp for cataloguing physical objects and retrieving them by pointing your phone camera. Snap a photo → find what you catalogued. See Design.md for the full design doc.

Features

  • Visual search — snap or upload a photo and get ranked matches from your catalogue
  • Text search — query by description, or combine image + text for better results
  • Hierarchical groups — organise items into an arbitrary-depth folder tree; search scopes to a subtree
  • Metro tile navigation — group browser with mosaic tiles and crossfading thumbnails
  • Widgets — attach notes, counters, date reminders, and location coordinates to any item
  • Multimodal embeddings — image and text live in the same embedding space (Gemini), enabling cross-modal retrieval

Quick start

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
# edit .env and set GEMINI_API_KEY
./run.sh

Open http://localhost:8080 on your phone (same LAN) or via Tailscale.

You need a Gemini API key (free tier is sufficient).

Deployment (Raspberry Pi)

./deploy.sh

Handles first-time setup and incremental deploys: syncs files, installs system deps (libopenblas0, libopenjp2-7), sets up the venv, registers the systemd service, and restarts the app. Target: alessio@raspino.local:/home/alessio/visual-memory.

DB is backed up nightly to Google Drive via rclone (gdrive:backups/visual-memory/). Cron is registered automatically by deploy.sh.

Stack

Layer Choice
Backend Python + FastAPI, single uvicorn worker
Frontend Vue 3 via CDN ESM (no build step), custom CSS
Storage SQLite (stdlib sqlite3), images on filesystem
AI Gemini (gemini-3.1-flash-lite-preview for captions, gemini-embedding-2-preview for embeddings)

About

Keep track of things visually

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors