LEAF is a lightweight and fully reproducible framework for carbon‑intensity forecasting and carbon‑aware scheduling of lab activities and compute jobs.
Summary
- Data: SMARD (TransnetBW, 15‑min resolution, 2024‑03‑02 – 2026‑03‑08), production‑based CO₂ intensity using UBA 2023 factors.
- Forecast: LightGBM model achieves MAE ≈ 43 g/kWh, R² ≈ 0.85 vs multiple time‑series baselines on the last week of data.
- Scheduling result: Carbon‑Aware with predicted CO₂ reduces emissions by ≈17.3% vs FIFO, reaching ≈88% of the theoretical maximum (19.8% with perfect CO₂ knowledge), with no deadline violations.
- Trade‑off: Emission savings come at the cost of higher average wait (e.g. ~150 min vs 8 min for FIFO), controlled via a shift horizon and delay penalty.
- Reproduce (from
leaf-scheduler/):python scripts/process_raw_data.pypython scripts/train_forecast.pypython scripts/run_scheduler_with_forecast.py
It is built entirely on open data and open‑source tools, and is designed to:
- use transparent, lightweight ML instead of heavy black‑box models,
- support energy‑ and CO₂‑aware planning of jobs under practical constraints,
- run fully on‑premise, in line with digital sovereignty and Green AI goals.
From the leaf-scheduler/ directory:
conda create -n leaf python=3.10 -y
conda activate leaf
pip install -r requirements.txt
# 1) Data preprocessing
python scripts/process_raw_data.py
# 2) Train CO₂ forecast model (LightGBM + baselines)
python scripts/train_forecast.py
# 3) Run scheduling demo using actual + predicted CO₂
python scripts/run_scheduler_with_forecast.pyThis will:
- create
data/processed/energy_data_full.csvwith production‑based CO₂ intensity, - train and evaluate the LightGBM forecaster and baselines,
- run FIFO and Carbon‑Aware scheduling on a synthetic job set in early March 2026,
- write comparison metrics to
data/processed/schedule_comparison_with_forecast.csv, - generate figures under
figures/(for the README) anddocs/(for the report).
High‑level pipeline:
SMARD raw data (TransnetBW, 15 min)
│
▼
Preprocessing (UBA emission factors)
│
├── Production-based CO₂ intensity (g/kWh)
▼
Forecasting (LightGBM + baselines)
│
└── Predicted CO₂ intensity (15 min)
▼
Scheduling (FIFO / EDF / Carbon-Aware)
│
└── Lab & compute job schedules
▼
Evaluation + Streamlit dashboard
Main components:
leaf/data: SMARD parsing, CO₂ calculation, feature engineeringleaf/forecast: baseline models and LightGBM regressorleaf/scheduler: task model, FIFO/EDF/Carbon‑Aware strategies, evaluationscripts/: end‑to‑end pipelines (preprocess, train, schedule)app/: Streamlit dashboard and smart suggestions
For detailed preprocessing and experiment notes, see:
docs/data_preprocessing_report.mddocs/experiment_log.md
Data source
- [2] SMARD (Strommarktdaten), Bundesnetzagentur
- Control area: [3] TransnetBW (Germany)
- Resolution: 15‑minute intervals
- Time range: 2024‑03‑02 – 2026‑03‑08
CO₂ intensity
- Production‑based carbon intensity for each 15‑minute slot
- Technology‑specific emission factors from [1] Umweltbundesamt (UBA, 2023) for:
- hard coal, natural gas, other conventional sources
- renewables (wind, solar, hydro, biomass) treated as zero operational emissions
This step is implemented in:
leaf/data/preprocessor.pyscripts/process_raw_data.py
and documented in docs/data_preprocessing_report.md.
Target
CO2_Intensity_gkWhat 15‑minute resolution
Features
- calendar features (hour, day of week, month, weekend)
- cyclic encodings (sin/cos)
- lagged values (1–168 hours)
- rolling statistics (6, 12, 24, 168 hours)
- simple differences (1h, 24h)
Implemented in:
leaf/data/features.py
Models
- baselines: Naive(t‑1h), Seasonal Naive(t‑24h), Moving Average(24h), Hourly Mean
- main model: LightGBM regressor (small, tree‑based, easily interpretable)
On the 2026‑03‑02 – 2026‑03‑08 test window, the LightGBM model significantly outperforms all baselines. Example metrics (test set):
| Model | MAE (g/kWh) | RMSE (g/kWh) | MAPE (%) | R² |
|---|---|---|---|---|
| Naive (t‑1h) | 65.33 | 99.83 | 21.14 | 0.48 |
| Seasonal Naive (t‑24h) | 91.99 | 134.32 | 24.97 | 0.06 |
| Moving Average (24h) | 106.70 | 132.32 | 32.26 | 0.09 |
| Hourly Mean | 101.64 | 139.69 | 28.82 | −0.02 |
| LightGBM | 42.83 | 52.80 | 11.25 | 0.85 |
Generated by scripts/train_forecast.py; full logs in docs/experiment_log.md.
Forecast vs actual (test period)
LightGBM predicted CO₂ intensity vs actual on the test set. Generated by scripts/train_forecast.py.
Feature importance
Top features for the LightGBM model. Generated by scripts/train_forecast.py.
Training and evaluation pipeline:
scripts/train_forecast.py- logs and exact metrics in
docs/experiment_log.md
Job model
- defined in
data/sample/jobs_pro_2026.csvandleaf/scheduler/task.py - covers:
type:Lab_Activity,AI_Training,Data_Processresource:Lab_Bench,GPU,CPU_Poolarrival,deadline,duration(15‑minute multiples)power_avg(kW),demand(capacity units),priority
Strategies (leaf/scheduler/strategies.py)
- FIFO: schedule in arrival order at earliest feasible time
- EDF: earliest deadline first
- Carbon‑Aware (two‑phase):
- build a feasible FIFO schedule,
- locally shift tasks to lower‑carbon slots within their slack, with:
- maximum shift horizon,
- capacity and deadline constraints,
- a delay penalty to balance CO₂ reduction vs waiting time.
Evaluation (leaf/scheduler/evaluator.py)
- total energy (kWh)
- total emissions (g CO₂)
- average CO₂ intensity (g/kWh)
- average renewable share (%)
- average wait and tardiness (minutes)
- deadline violation rate
Definitions
- FIFO: baseline scheduling policy using the actual CO₂ curve for evaluation.
- Carbon‑Aware (actual CO₂): carbon‑aware heuristic that has access to the true future CO₂ at decision time (oracle / theoretical upper bound for emission savings).
- Carbon‑Aware (predicted CO₂): same heuristic but using the LightGBM forecast to decide when to run jobs; emissions are still evaluated against actual CO₂ (realistic, deployable setting).
Key result (example configuration)
For the job set in early March 2026, evaluated against actual CO₂ intensity:
| Strategy | Total CO₂ (kg) | Avg CO₂ (g/kWh) | Renewable share (%) | CO₂ saved vs FIFO |
|---|---|---|---|---|
| FIFO | 270.81 | 413.5 | 41.2 | — |
| Carbon‑Aware (actual CO₂) | 217.27 | 331.7 | 46.8 | 19.8% |
| Carbon‑Aware (predicted CO₂) | 223.89 | 341.8 | 49.2 | 17.3% (~88% of max) |
Source: data/processed/schedule_comparison_with_forecast.csv (after running scripts/run_scheduler_with_forecast.py).
Schedule comparison (emissions by strategy)
Total CO₂ emissions and savings vs FIFO. Generated by scripts/run_scheduler_with_forecast.py.
Scheduling demo pipeline:
scripts/run_scheduler_with_forecast.py
If you want to regenerate the synthetic job set used in the demo
(data/sample/jobs_pro_2026.csv), you can run:
python scripts/generate_jobs_pro.pyThe dashboard in app/app.py provides:
- Dashboard: daily CO₂ intensity and renewable share plots, key metrics
- Suggestions box:
- on the main page, a compact box shows the best low‑carbon window for the selected day and one example high‑energy task (e.g. “15:00–17:00 — autoclave cycle”),
- a “View details” button opens a dialog with more suggestions (low‑carbon windows and high‑carbon periods to avoid).
- Task Manager: overview of jobs by type and energy, task table, demo “add task” form
- Results: comparison of FIFO vs Carbon‑Aware schedules and exportable metrics
Run streamlit run app/app.py and open the Dashboard to see carbon intensity curves and the Suggestions box; the Results page shows the same comparison metrics as in the table above.
conda create -n leaf python=3.10 -y
conda activate leaf
pip install -r requirements.txtProcess SMARD raw data into a cleaned time series with CO₂ intensity:
python scripts/process_raw_data.pyThis reads from data/raw/Actual_generation_202403020000_202603090000_Quarterhour.csv
and writes data/processed/energy_data_full.csv.
python scripts/train_forecast.pyOutputs:
models/lightgbm_co2_forecast.txtmodels/feature_importance.csvmodels/evaluation_results.csvfigures/forecast_comparison.png,figures/feature_importance.png,figures/shap_summary.png
To have the result figures appear in this README on GitHub, keep the figures/ directory in the repo (e.g. commit the generated PNGs).
python scripts/run_scheduler_with_forecast.pyThis computes FIFO and Carbon‑Aware schedules using both actual and predicted CO₂, and writes comparison metrics to:
data/processed/schedule_comparison_with_forecast.csv
streamlit run app/app.pyThen open the provided URL (typically http://localhost:8501) in your browser.
LEAF follows a few deliberate design choices:
-
Open and reproducible:
Data sources ([2] SMARD, [1] UBA) and all code paths are transparent; no external closed APIs or proprietary ML models are required. -
Lightweight Green AI:
The main model is a compact LightGBM regressor; it is sufficient to capture most of the CO₂ dynamics while keeping training and inference costs low. -
Digital sovereignty:
The full pipeline runs locally; it can be integrated into university IT and lab infrastructures without vendor lock‑in.
Current limitations and potential extensions:
- use official SMARD day‑ahead / intraday forecasts as additional baselines and as extra features,
- implement receding‑horizon scheduling that periodically updates forecasts and re‑optimizes remaining tasks,
- compare the heuristic two‑phase scheduler to MILP formulations on smaller instances.
[1] Umweltbundesamt (2023). Entwicklung der spezifischen Treibhausgas‑Emissionen des deutschen Strommix in den Jahren 1990–2022.
[2] SMARD – Strommarktdaten, Bundesnetzagentur. https://www.smard.de
[3] TransnetBW GmbH – Control Area Data. https://www.transnetbw.de



