docs(research): define learned-model artifact contracts#439
Conversation
Reviewer's GuideDefines the initial N.0 learned-model experiment artifact contract and threads it into the docs/roadmap and run-artifact contract, while updating overall product positioning toward a local-first quantitative research and supervised execution system instead of a 'web3 app'. Class diagram for N.0 learned-model artifact schemasclassDiagram
class ModelRunRoot {
+string model_run_id
+string path
}
class DatasetManifest {
+string schema_version
+string artifact_type
+string model_run_id
+string created_at
+string dataset_id
+Source source
+string[] universe
+TimeRange time_range
+int rows
+Target target
+Split split
+string data_hash
+string generation_command
}
class Source {
+string kind
+string path
}
class TimeRange {
+string start
+string end
+string timezone
}
class Target {
+string name
+string horizon
+string definition
}
class Split {
+string method
+SplitWindow train
+SplitWindow validation
+SplitWindow test
}
class SplitWindow {
+string start
+string end
}
class FeatureManifest {
+string schema_version
+string artifact_type
+string model_run_id
+string dataset_id
+string feature_set_id
+Feature[] features
+Lookback lookback
+Normalization normalization
+string[] leakage_guards
+string feature_hash
+string generation_command
}
class Feature {
+string name
+string source
+map parameters
}
class Lookback {
+int max_bars
+bool uses_future_data
}
class Normalization {
+string method
+string fit_scope
}
class ModelConfig {
+string schema_version
+string artifact_type
+string model_run_id
+string model_family
+string model_name
+Library library
+map hyperparameters
+int random_seed
+string training_objective
+InputShape input_shape
+OutputSpec output
}
class Library {
+string name
+string version
}
class InputShape {
+int features
+int window
}
class OutputSpec {
+string kind
+string target
}
class TrainingSummary {
+string schema_version
+string artifact_type
+string model_run_id
+string status
+string started_at
+string finished_at
+int duration_seconds
+string dataset_manifest_path
+string feature_manifest_path
+string model_config_path
+Metrics metrics
+BaselineComparison baseline_comparison
+Reproducibility reproducibility
+PromotionAssessment promotion_assessment
}
class Metrics {
+map train
+map validation
+map test
+map downstream_market
}
class BaselineComparison {
+bool required
+map rule_based_baseline
+map classical_ml_baseline
+string result
}
class Reproducibility {
+int random_seed
+string code_version
+string data_hash
+string feature_hash
}
class PromotionAssessment {
+bool eligible_for_paper
+string[] blocking_reasons
}
ModelRunRoot "1" o-- "1" DatasetManifest : contains
ModelRunRoot "1" o-- "1" FeatureManifest : contains
ModelRunRoot "1" o-- "1" ModelConfig : contains
ModelRunRoot "1" o-- "1" TrainingSummary : contains
DatasetManifest "1" --> "1" Source
DatasetManifest "1" --> "1" TimeRange
DatasetManifest "1" --> "1" Target
DatasetManifest "1" --> "1" Split
Split "1" --> "1" SplitWindow : train
Split "1" --> "1" SplitWindow : validation
Split "1" --> "1" SplitWindow : test
FeatureManifest "1" --> "*" Feature
FeatureManifest "1" --> "1" Lookback
FeatureManifest "1" --> "1" Normalization
ModelConfig "1" --> "1" Library
ModelConfig "1" --> "1" InputShape
ModelConfig "1" --> "1" OutputSpec
TrainingSummary "1" --> "1" Metrics
TrainingSummary "1" --> "1" BaselineComparison
TrainingSummary "1" --> "1" Reproducibility
TrainingSummary "1" --> "1" PromotionAssessment
Flow diagram for N.0 learned-model experiment artifact lifecycleflowchart TD
A_define_dataset["Define dataset and temporal splits"]
B_emit_dataset_manifest["Emit dataset_manifest.json"]
C_define_features["Define and generate features"]
D_emit_feature_manifest["Emit feature_manifest.json"]
E_configure_model["Configure model and hyperparameters"]
F_emit_model_config["Emit model_config.json"]
G_train_model["Train model with fixed random_seed"]
H_emit_training_summary["Emit training_summary.json"]
I_optional_downstream["Optional downstream backtest producing report.json"]
J_promotion_blocked["Enforce non-promotion rules at N.0"]
subgraph ModelRunRootDir["outputs/model_runs/<model_run_id>/"]
B_emit_dataset_manifest --> D_emit_feature_manifest
D_emit_feature_manifest --> F_emit_model_config
F_emit_model_config --> H_emit_training_summary
end
A_define_dataset --> B_emit_dataset_manifest
C_define_features --> D_emit_feature_manifest
E_configure_model --> F_emit_model_config
G_train_model --> H_emit_training_summary
H_emit_training_summary --> I_optional_downstream
H_emit_training_summary --> J_promotion_blocked
I_optional_downstream --> J_promotion_blocked
File-Level Changes
Assessment against linked issues
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The JSON example payloads currently use placeholder values like empty strings and zero counts (e.g.,
rows: 0, emptydata_hash); consider either marking these explicitly as placeholders or providing realistic example values so readers don’t misinterpret them as recommended defaults. - The N.0 artifact list and rules are now described in
roadmap.md,run-artifact-contract.md, andlearned-model-artifact-contract.md; you might reduce duplication by keeping the detailed contract only in the dedicated doc and linking to it from the roadmap and run-artifact contract to avoid divergence over time.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The JSON example payloads currently use placeholder values like empty strings and zero counts (e.g., `rows: 0`, empty `data_hash`); consider either marking these explicitly as placeholders or providing realistic example values so readers don’t misinterpret them as recommended defaults.
- The N.0 artifact list and rules are now described in `roadmap.md`, `run-artifact-contract.md`, and `learned-model-artifact-contract.md`; you might reduce duplication by keeping the detailed contract only in the dedicated doc and linking to it from the roadmap and run-artifact contract to avoid divergence over time.
## Individual Comments
### Comment 1
<location path="docs/roadmap.md" line_range="528" />
<code_context>
+Authority rule:
+
+- QuantLab owns dataset definition, feature definition, model validation, artifact contracts, and promotion criteria
+- Stepbit may later orchestrate learned-model workflows, but must not own modeling authority
+- Quant Pulse may later provide upstream hypotheses or signal context, but must not certify learned-model validity
+
</code_context>
<issue_to_address>
**suggestion (typo):** Consider adding an explicit subject after "but" for grammatical completeness and consistency.
For consistency with existing docs (e.g., `learned-model-artifact-contract.md`) and improved readability, consider: `Stepbit may later orchestrate learned-model workflows, but it must not own modeling authority.`
```suggestion
- Stepbit may later orchestrate learned-model workflows, but it must not own modeling authority.
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| Authority rule: | ||
|
|
||
| - QuantLab owns dataset definition, feature definition, model validation, artifact contracts, and promotion criteria | ||
| - Stepbit may later orchestrate learned-model workflows, but must not own modeling authority |
There was a problem hiding this comment.
suggestion (typo): Consider adding an explicit subject after "but" for grammatical completeness and consistency.
For consistency with existing docs (e.g., learned-model-artifact-contract.md) and improved readability, consider: Stepbit may later orchestrate learned-model workflows, but it must not own modeling authority.
| - Stepbit may later orchestrate learned-model workflows, but must not own modeling authority | |
| - Stepbit may later orchestrate learned-model workflows, but it must not own modeling authority. |
Summary
web3 apppublic framing with a more professional execution-venue positioningScope
Docs-only implementation for #438.
Out of scope:
Validation
git diff --cached --checkCloses #438
Summary by Sourcery
Define a proposed learned-model artifact contract and integrate a parallel neural research track into the roadmap and docs without changing runtime behavior.
Documentation: