feat(library): add statistics dashboard panel#158
Conversation
Add a lazy-loaded statistics panel to the Library view that shows aggregate metrics across the entire manuscript collection. - new component library_stats.py with render_library_stats() - metrics: downloaded pages (count + %), transcribed pages, OCR pages, disk usage (human-readable bytes) - provider distribution via pure-CSS percentage bars, top 8 libraries - transcription and OCR coverage computed by scanning per-manuscript transcription.json files (engine + is_manual fields) - disk usage computed by summing file sizes under each local_path - loaded lazily via hx-get="/api/library/stats" hx-trigger="load" so the main Library page is not blocked - new GET /api/library/stats route; stats always reflect the full collection regardless of active filters Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #158 +/- ##
==========================================
- Coverage 70.95% 69.90% -1.06%
==========================================
Files 153 156 +3
Lines 13229 13437 +208
==========================================
+ Hits 9387 9393 +6
- Misses 3842 4044 +202
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Removes inline stats from the Library page and introduces: - Compact DB-only widget in sidebar footer (mss count, pages, % local) loaded lazily via /api/stats/sidebar; hidden when sidebar is collapsed - Dedicated /stats route with fast DB metrics (manuscript count, pages, provider distribution, recent activity) plus lazy /api/stats/detail panel for slow disk + transcription scans - 📊 Statistiche nav item added to sidebar Closes #124 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds a new Statistics area to the Studio UI, including a dedicated /stats page and a lazy-loaded sidebar “nerd stats” widget, to surface collection-wide library metrics without blocking initial page render.
Changes:
- Introduce
/statspage with fast (DB-only) metrics and a lazy-loaded “detail” panel. - Add
/api/stats/sidebar(sidebar widget) and/api/stats/detail(disk/transcription scan) endpoints. - Add a “Statistiche” entry to the main sidebar nav and register stats routes in the app.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| src/studio_ui/routes/stats_handlers.py | Implements handlers for stats page, sidebar widget fragment, and detail fragment. |
| src/studio_ui/routes/stats.py | Registers /stats and /api/stats/* endpoints. |
| src/studio_ui/components/library_stats.py | Implements UI components + filesystem/JSON scanning helpers for stats metrics. |
| src/studio_ui/components/layout.py | Adds “Statistiche” nav item and an HTMX placeholder to load the sidebar stats widget. |
| src/studio_app.py | Registers stats routes during app startup. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| dt = datetime.fromisoformat(str(ts_str).replace("Z", "+00:00")) | ||
| delta = datetime.now(timezone.utc) - dt.astimezone(timezone.utc) | ||
| days = delta.days | ||
| if days == 0: | ||
| hours = delta.seconds // 3600 | ||
| return "poco fa" if hours == 0 else f"{hours}h fa" | ||
| if days == 1: | ||
| return "ieri" | ||
| if days < 7: | ||
| return f"{days}g fa" | ||
| if days < 30: | ||
| return f"{days // 7}sett fa" | ||
| if days < 365: | ||
| return f"{days // 30}m fa" | ||
| return f"{days // 365}a fa" | ||
| except Exception: | ||
| return "—" |
There was a problem hiding this comment.
_time_ago() treats SQLite updated_at values like YYYY-MM-DD HH:MM:SS as naive datetimes, then converts them with astimezone(timezone.utc), which interprets the naive value as local time and can shift the relative time display. Consider detecting naive timestamps and explicitly treating them as UTC (e.g., attach timezone.utc before computing the delta).
| def _dir_size(path: Path) -> int: | ||
| total = 0 | ||
| with suppress(OSError): | ||
| total = sum(f.stat().st_size for f in path.rglob("*") if f.is_file()) | ||
| return total |
There was a problem hiding this comment.
_dir_size() suppresses a single OSError for the whole directory scan; if any file stat fails (transient delete, permission, etc.), the function returns 0 and disk usage becomes wildly inaccurate. It would be more robust to handle OSError per-file (skip unreadable files) so partial failures don't zero-out the entire directory size.
| def _scan_disk_usage(manuscripts: list[dict]) -> int: | ||
| """Return total bytes used across all local manuscript directories.""" | ||
| total = 0 | ||
| seen: set[str] = set() | ||
| for m in manuscripts: | ||
| lp = m.get("local_path") | ||
| if not lp or lp in seen: | ||
| continue | ||
| seen.add(lp) | ||
| p = Path(lp) | ||
| if p.exists(): | ||
| total += _dir_size(p) | ||
| return total |
There was a problem hiding this comment.
_scan_disk_usage() trusts local_path from the DB and will rglob() any existing path. If the DB becomes corrupted (or a future migration writes unexpected values), this endpoint could end up scanning outside the downloads directory (very slow and potentially leaking information via aggregated sizes). VaultManager’s delete_manuscript() explicitly guards local_path to be under the configured downloads dir; it would be good to apply the same safety check here (resolve path and skip anything outside downloads base).
| def setup_stats_routes(app): | ||
| """Register statistics page and API routes.""" | ||
| app.get("/stats")(stats_handlers.stats_page) | ||
| app.get("/api/stats/sidebar")(stats_handlers.stats_sidebar_widget) | ||
| app.get("/api/stats/detail")(stats_handlers.stats_detail_content) |
There was a problem hiding this comment.
The PR description/Issue #124 describe adding a Library view stats panel lazy-loaded from /api/library/stats (or /library/stats), but this change introduces a separate /stats page and /api/stats/* endpoints instead. Please reconcile the implementation with the stated route(s)/placement (either update the PR description/issue references, or add the Library panel + routes as described).
| def stats_detail_content(): | ||
| """Return the lazy-loaded detail metrics panel (disk + transcription scan).""" | ||
| manuscripts = VaultManager().get_all_manuscripts() | ||
| return render_library_stats(manuscripts) |
There was a problem hiding this comment.
/api/stats/detail performs filesystem scans (disk usage via rglob, plus per-manuscript JSON reads) on every load. Even though it's lazy-loaded, this can still be expensive for large libraries and can regress the <500ms target mentioned in Issue #124. Consider caching the computed stats with a short TTL (in-memory or persisted) and/or incrementally updating from DB events to avoid repeated full scans.
| def setup_stats_routes(app): | ||
| """Register statistics page and API routes.""" | ||
| app.get("/stats")(stats_handlers.stats_page) | ||
| app.get("/api/stats/sidebar")(stats_handlers.stats_sidebar_widget) | ||
| app.get("/api/stats/detail")(stats_handlers.stats_detail_content) |
There was a problem hiding this comment.
There’s good test coverage for other route modules (Discovery/Export/Library/etc.), but the new /stats page and /api/stats/sidebar + /api/stats/detail endpoints don’t appear to have tests. Adding route/handler tests would help prevent regressions (e.g., HX vs full-page behavior, and that sidebar/detail endpoints return expected fragments).
| top = sorted(provider_counts.items(), key=lambda x: -x[1])[:10] | ||
| provider_panel = Div( | ||
| P("Distribuzione per biblioteca", cls=_SECTION_LABEL_CLS), | ||
| Div(*[_provider_bar_row(n, c, total) for n, c in top], cls="flex flex-col gap-2"), | ||
| cls=_CARD_CLS + " mb-6", |
There was a problem hiding this comment.
The PR description says the provider breakdown shows the “top 8”, but the implementation slices [:10]. Please align the code with the intended number (or update the description) so the UI/expectations match.
- _time_ago: treat naive SQLite timestamps as UTC (not local time) - _dir_size: handle OSError per-file so partial failures don't zero total - _scan_disk_usage: guard local_path against paths outside downloads dir - stats_detail_content: add 5-min in-memory TTL cache to avoid repeated scans - tests: add 33 unit tests covering helpers, components, and route handlers Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
On fresh CI containers the monotonic clock can be < 300 s, making timestamp 0.0 appear within the TTL window and the cache valid. Seed with (now - TTL - 1) instead to guarantee expiry regardless of uptime. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
hx-get="/api/library/stats" hx-trigger="load"so the main Library page is not blocked.Metrics shown
downloaded_canvases / total_canvasesfrom vaulttranscription.json,full_textfieldis_manual: falselocal_pathImplementation
src/studio_ui/components/library_stats.py— new component (175 LOC)GET /api/library/stats— new route, registered inlibrary.pylibrary_handlers.library_stats_panel()— new handlerlibrary.pycomponent — HTMX placeholder div added between KPI strip and filtersTest plan
ruff check— cleanruff check --select C901— all functions under complexity 10Closes #124