Skip to content

feat(storage): IndexAnalyzer — tier-aware index health analysis with cron scheduling and AI/ML advisor hook#4802

Merged
makr-code merged 2 commits intodevelopfrom
copilot/add-analysis-function-for-index
Apr 22, 2026
Merged

feat(storage): IndexAnalyzer — tier-aware index health analysis with cron scheduling and AI/ML advisor hook#4802
makr-code merged 2 commits intodevelopfrom
copilot/add-analysis-function-for-index

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 22, 2026

Description

Adds IndexAnalyzer, a per-index health analysis engine that computes fragmentation from RocksDB SST metrics, classifies it against hot/warm/cold tier thresholds, and produces a maintenance recommendation. Fully driven by YAML config; supports cron-based scheduling and an opt-in AI/ML override hook.

Core components

  • include/storage/index_analyzer.hTierThresholds (3-tier defaults), IndexAnalysisReport, IndexRecommendation (NONE → UPDATE_STATS → REORGANIZE → PARTIAL_REBUILD → FULL_REBUILD), IIndexAnalysisAdvisor interface, IndexAnalyzer class
  • src/storage/index_analyzer.cpp — Fragmentation computed from (total_sst - live_sst) / total_sst + L0_files × 2%; heuristic constants named (kFragPctPerL0File, kBytesPerMB, etc.); cron loop via cv_.wait_until(next_fire), re-evaluates on setConfig(); AI advisor exceptions caught, rule-based result preserved
  • config/index_analyze.yaml — Reference YAML; per-tier threshold blocks + per-index overrides + ai_advisor toggle

Tier threshold defaults

Tier REORGANIZE PARTIAL_REBUILD FULL_REBUILD stats_stale
hot 10 % 20 % 35 % 1 h
warm 18 % 32 % 50 % 6 h
cold 30 % 50 % 70 % 24 h

Looser thresholds for warm/cold: rebuild cost on SATA/object-storage dominates.

AI/ML intervention hook

// Opt-in; leave disabled for rule-based-only deployments
class IIndexAnalysisAdvisor {
public:
    // Return nullopt → keep rule-based recommendation unchanged
    virtual std::optional<std::pair<IndexRecommendation, std::string>>
    advise(const IndexAnalysisReport& preliminary) = 0;
};

analyzer.setAdvisor(std::make_shared<MyMLAdvisor>(model_path));

Known limitation

stats_age_hours is currently fixed at kPlaceholderStatsAgeHours = 2. Real staleness tracking requires a stats_last_updated key in a dedicated metadata CF — tracked in FUTURE_ENHANCEMENTS.md.

Build wiring

index_analyzer.cpp added to cmake/StorageEnhancements.cmake, cmake/CMakeLists.txt, cmake/ModularBuild.cmake. Test target IndexAnalyzerFocusedTests (IA-01…IA-15) registered in tests/CMakeLists.txt.

Linked Issues

Type of Change

  • Bug fix (non-breaking)
  • New feature (non-breaking)
  • Refactoring (non-breaking)
  • Documentation
  • Breaking change (requires MAJOR version bump — see VERSIONING.md)
  • Security fix
  • Other:

Breaking Change Checklist

  • MAJOR version bump planned in VERSION and CMakeLists.txt
  • Migration guide added in docs/migration/
  • Announcement prepared for GitHub Discussions (≥ 2 weeks before release)
  • CHANGELOG ### Removed / ### Changed section updated

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • Benchmarks run (if performance-sensitive change)

Tests cover: tier threshold defaults, YAML load (valid + missing-file error path), null db_wrapper guard, setConfig/setAdvisor thread safety, all five classify() branches, lastReports() initial state. Pure-logic tests (IA-01…IA-09, IA-10…IA-15) require no live RocksDB instance.

📚 Research & Knowledge (wenn applicable)

  • Diese PR basiert auf wissenschaftlichen Paper(s) oder Best Practices?
    • Falls JA: Research-Dateien in /docs/research/ angelegt?
    • Falls JA: Im Modul-README unter "Wissenschaftliche Grundlagen" verlinkt?
    • Falls JA: In /docs/research/implementation_influence/ eingetragen?

Relevante Quellen:

  • Paper:
  • Best Practice:
  • Architecture Decision:

Checklist

  • Code follows project style guidelines (clang-format / clang-tidy)
  • Self-review completed
  • Documentation updated (if needed)
  • CHANGELOG.md updated under [Unreleased]
  • No new warnings introduced
  • Security-sensitive paths reviewed by security maintainer (if applicable)

Copilot AI and others added 2 commits April 22, 2026 12:16
…, cron scheduling, AI/ML advisor hook, and YAML config (IA-01…IA-15)

Agent-Logs-Url: https://github.com/makr-code/ThemisDB/sessions/12704eda-57a9-409b-ac6f-53424ae92021

Co-authored-by: makr-code <150588092+makr-code@users.noreply.github.com>
…cs, unique tmp file in tests

Agent-Logs-Url: https://github.com/makr-code/ThemisDB/sessions/12704eda-57a9-409b-ac6f-53424ae92021

Co-authored-by: makr-code <150588092+makr-code@users.noreply.github.com>
@makr-code makr-code marked this pull request as ready for review April 22, 2026 14:34
@makr-code makr-code self-requested a review as a code owner April 22, 2026 14:34
@makr-code makr-code merged commit 44c57f3 into develop Apr 22, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants