Cancer Assertion Extraction

Code accompanying an in-progress study on heterogeneous supervision and evaluation validity for cancer-focused biomedical relation extraction.

This repository contains source code only. Generated artifacts (training data shards, model checkpoints, evaluation results, run logs) live on the cluster filesystem and are not version-controlled. The full design document (research questions, statistical plan, hypothesis registry) is maintained locally and is not part of this repository.

Directory layout

data_pipeline/          legacy; superseded by dataset_inventory + oncology_projection
dataset_inventory/      raw-data audits, leakage detection, corpus statistics
oncology_projection/    MeSH C04 oncology subset derivation, cancer lexicon
schema_exploration/     schema definitions (S_flat / S_pair / S_mech), package remapping
training_data_generation/   T1/T2/T3/T4 data preparation (output JSONL on cluster)
fine_tuning_experiments/    training pipeline + Phase A scripts + Phase A-eval
external_evaluation/    benchmark loaders and inference helpers
knowledge_grounded_evidence_audit/   CIViC-anchored downstream KB audit
report/, reports/       legacy intermediate reports (kept locally; not uploaded)

The main currently-active code paths are:

fine_tuning_experiments/schema_exp/ — Phase A configs, sbatch, and eval pipeline.
fine_tuning_experiments/schema_exp/eval/ — three-pass inference (BioRED test, BC5CDR test, KB-surface) and aggregation.
schema_exploration/ — schema label functions and data package remapping.
oncology_projection/ — MeSH C04 keyword/MeSH projection of T2 oncology subset.
dataset_inventory/audit/ — raw data audits and leakage validation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cancer Assertion Extraction

Directory layout

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
dataset_inventory		dataset_inventory
external_evaluation		external_evaluation
fine_tuning_experiments		fine_tuning_experiments
knowledge_grounded_evidence_audit		knowledge_grounded_evidence_audit
oncology_projection		oncology_projection
report/decision_analysis		report/decision_analysis
schema_exploration		schema_exploration
.gitignore		.gitignore
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Cancer Assertion Extraction

Directory layout

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages