Current situation
- the mapping of country-codes and nuts2/nuts3-region-codes is hard coded in
utils.py
- possible changes of nuts2 regions or nuts3 regions are not recognised and must be adapted manually.
Goal
- use the
eurostat-package to create a mapping of countries, nuts2-Regions and nuts3-regions to dynamically adapt for possible changes.
- this is an example of how one can create the mapping for nuts2 and nuts3 - regions to names:
import re
import eurostat
def get_nuts_mapping(level: int, country_prefix: str | None = None) -> dict[str, str]:
# Use a regional dataset so Eurostat exposes a geo dictionary with NUTS codes
dataset = "nama_10r_2gdp" if level == 2 else "nama_10r_3gdp"
geo_dic = eurostat.get_dic(dataset, par="geo", frmt="dict", lang="en")
pattern = r"^[A-Z]{2}[A-Z0-9]{2}$" if level == 2 else r"^[A-Z]{2}[A-Z0-9]{3}$"
out = {
code: name
for code, name in geo_dic.items()
if re.match(pattern, code)
and (country_prefix is None or code.startswith(country_prefix))
}
return dict(sorted(out.items()))
NUTS_2_REGIONS = get_nuts_mapping(level=2, country_prefix="AT")
NUTS_3_REGIONS = get_nuts_mapping(level=3, country_prefix="AT")
Plan: Dynamic Eurostat Region Mapping
Replace hard-coded country/NUTS mappings with a Eurostat-backed mapping pipeline that fetches country + EU-wide NUTS2/NUTS3 names at runtime, merges model-specific overrides, and falls back to deterministic local defaults when Eurostat is unavailable. Keep current consumer behavior intact (REGION_MAPPING.get(code, code)), then fully renew mapping tests to cover structure, precedence, fallback, and integration output behavior.
Steps
-
Phase 1 - Dependency and environment readiness
-
Add eurostat to the Pixi environment in /home/maxnutz/Documents/pypsa-at-validation/pypsa-validation_processing/pixi.toml (via pixi add eurostat) and sync lock state.
-
Resolve any solver conflicts by narrowing only where required (prefer minimal pinning and avoid touching unrelated dependency constraints).
-
Verify the environment can import eurostat in the Pixi context and that pixi run test still starts.
-
Phase 2 - Mapping architecture refactor (no behavior break)
-
Introduce a dedicated mapping module in /home/maxnutz/Documents/pypsa-at-validation/pypsa-validation_processing/pypsa_validation_processing/utils.py (or a new internal helper module if preferred by maintainers) with clear separations:
- Base country mapping from Eurostat ISO-like country codes.
- NUTS2 mapping loader.
- NUTS3 mapping loader.
- Model-specific override mapping (
COUNTRIES_SPECIAL_CASES).
- Implement Eurostat loaders as pure functions with explicit signatures, for example:
get_country_mapping() -> dict[str, str]
get_nuts_mapping(level: int, country_prefix: str | None = None) -> dict[str, str]
- Enforce code filtering rules with regex for valid keys:
- Countries: 2-letter uppercase codes.
- NUTS2: 4-character NUTS pattern.
- NUTS3: 5-character NUTS pattern.
- Build one comprehensive mapping dict by merging in deterministic precedence order:
- countries -> NUTS2 -> NUTS3 ->
COUNTRIES_SPECIAL_CASES.
-
Export REGION_MAPPING exactly as current consumers expect (plain dict[str, str]) so Network_Processor.structure_pyam_from_pandas() in /home/maxnutz/Documents/pypsa-at-validation/pypsa-validation_processing/pypsa_validation_processing/class_definitions.py remains unchanged.
-
Phase 3 - Resilience and fallback behavior
-
Add robust runtime fallback handling around Eurostat access:
- Catch network/API/data-shape exceptions.
- Return last known local/static mapping if fetching fails.
- Keep unknown codes pass-through behavior intact downstream.
- Add lightweight caching policy decision in implementation notes:
- v1 default: in-process fetch with fallback only (no persisted cache), unless CI/runtime cost proves high.
-
Keep name language policy explicit: use Eurostat English names as default canonical names.
-
Phase 4 - REGION_MAPPING composition with special cases
-
Keep /home/maxnutz/Documents/pypsa-at-validation/pypsa-validation_processing/pypsa_validation_processing/utils.py as the integration surface where REGION_MAPPING is assembled.
-
Merge Eurostat-derived mapping with COUNTRIES_SPECIAL_CASES so model-specific pseudo-regions continue to resolve.
-
Add explicit precedence tests ensuring special-case keys override Eurostat values when overlaps exist.
-
Phase 5 - Test suite renewal (mapping-focused)
-
Replace and expand /home/maxnutz/Documents/pypsa-at-validation/pypsa-validation_processing/tests/test_utils.py to validate:
REGION_MAPPING is a dict with string keys/values.
- EU country coverage includes required known codes.
- NUTS2/NUTS3 entries are present and correctly shaped.
- Merge precedence for overlapping keys.
- Add Eurostat integration unit tests with mocking (no live network in CI):
- Success path: mocked Eurostat dictionaries produce expected merged output.
- Failure path: Eurostat call errors trigger fallback mapping.
- Partial-data path: missing datasets still produce valid mapping with fallback.
-
Extend integration behavior tests in /home/maxnutz/Documents/pypsa-at-validation/pypsa-validation_processing/tests/test_network_processor.py and /home/maxnutz/Documents/pypsa-at-validation/pypsa-validation_processing/tests/test_aggregation.py to verify mapped location names flow through when map_country_codes_to_names=True and pass-through when disabled.
-
Ensure tests that cover statistics_functions.py continue to satisfy repository rule: outputs must be pandas.Series with a MultiIndex containing at least country and unit.
-
Phase 6 - Validation and regression checks
-
Run focused tests first:
pixi run test tests/test_utils.py
pixi run test tests/test_network_processor.py
pixi run test tests/test_aggregation.py
- Run full suite:
- Run workflow smoke test for one existing config (no config edits required):
pixi run workflow with an existing baseline setup, confirming no regression in location-name mapping output columns.
Relevant files
/home/maxnutz/Documents/pypsa-at-validation/pypsa-validation_processing/pixi.toml - add eurostat, resolve environment constraints.
/home/maxnutz/Documents/pypsa-at-validation/pypsa-validation_processing/pypsa_validation_processing/utils.py - replace hard-coded static mapping construction with Eurostat-backed composition + fallback.
/home/maxnutz/Documents/pypsa-at-validation/pypsa-validation_processing/pypsa_validation_processing/class_definitions.py - verify existing REGION_MAPPING consumption contract remains unchanged.
/home/maxnutz/Documents/pypsa-at-validation/pypsa-validation_processing/tests/test_utils.py - fully renew mapping tests.
/home/maxnutz/Documents/pypsa-at-validation/pypsa-validation_processing/tests/test_network_processor.py - add mapping behavior integration assertions.
/home/maxnutz/Documents/pypsa-at-validation/pypsa-validation_processing/tests/test_aggregation.py - verify mapped-name behavior under aggregation flows.
/home/maxnutz/Documents/pypsa-at-validation/pypsa-validation_processing/tests/conftest.py - shared fixtures/mocks for Eurostat responses and fallback scenarios.
Verification
- Dependency verification: confirm
eurostat import works under Pixi environment and lock resolution is stable.
- Unit verification: mocked Eurostat success/failure tests pass without network.
- Integration verification: mapping is applied in
Network_Processor output only when config flag is enabled.
- Regression verification: full test suite passes with no changes to forbidden config/definitions areas.
- Functional verification: representative workflow output contains readable region names for country + NUTS2 + NUTS3 codes and still resolves model-specific pseudo-regions.
Decisions
- Runtime strategy: fetch from Eurostat at runtime with graceful fallback.
- Scope: include all country names and EU-wide NUTS2/NUTS3 mappings (not AT-only).
- Naming policy: use Eurostat English names as canonical output labels.
- Preserve compatibility: keep existing consumer interface (
REGION_MAPPING dict) and pass-through behavior for unknown codes.
- Out of scope: changes to
definitions/ and arbitrary edits to configs/.
Further Considerations
- Performance guardrail recommendation: if runtime fetch latency becomes noticeable, add optional persisted cache in a follow-up ticket.
- Reproducibility recommendation: optionally pin Eurostat data snapshot/version in a later enhancement if strict run-to-run name stability is required.
- Observability recommendation: add one concise warning log when fallback is used, to make data-source mode visible during workflow runs.
Current situation
utils.pyGoal
eurostat-packageto create a mapping of countries, nuts2-Regions and nuts3-regions to dynamically adapt for possible changes.Plan: Dynamic Eurostat Region Mapping
Replace hard-coded country/NUTS mappings with a Eurostat-backed mapping pipeline that fetches country + EU-wide NUTS2/NUTS3 names at runtime, merges model-specific overrides, and falls back to deterministic local defaults when Eurostat is unavailable. Keep current consumer behavior intact (
REGION_MAPPING.get(code, code)), then fully renew mapping tests to cover structure, precedence, fallback, and integration output behavior.Steps
Phase 1 - Dependency and environment readiness
Add
eurostatto the Pixi environment in/home/maxnutz/Documents/pypsa-at-validation/pypsa-validation_processing/pixi.toml(viapixi add eurostat) and sync lock state.Resolve any solver conflicts by narrowing only where required (prefer minimal pinning and avoid touching unrelated dependency constraints).
Verify the environment can import
eurostatin the Pixi context and thatpixi run teststill starts.Phase 2 - Mapping architecture refactor (no behavior break)
Introduce a dedicated mapping module in
/home/maxnutz/Documents/pypsa-at-validation/pypsa-validation_processing/pypsa_validation_processing/utils.py(or a new internal helper module if preferred by maintainers) with clear separations:COUNTRIES_SPECIAL_CASES).get_country_mapping() -> dict[str, str]get_nuts_mapping(level: int, country_prefix: str | None = None) -> dict[str, str]COUNTRIES_SPECIAL_CASES.Export
REGION_MAPPINGexactly as current consumers expect (plaindict[str, str]) soNetwork_Processor.structure_pyam_from_pandas()in/home/maxnutz/Documents/pypsa-at-validation/pypsa-validation_processing/pypsa_validation_processing/class_definitions.pyremains unchanged.Phase 3 - Resilience and fallback behavior
Add robust runtime fallback handling around Eurostat access:
Keep name language policy explicit: use Eurostat English names as default canonical names.
Phase 4 - REGION_MAPPING composition with special cases
Keep
/home/maxnutz/Documents/pypsa-at-validation/pypsa-validation_processing/pypsa_validation_processing/utils.pyas the integration surface whereREGION_MAPPINGis assembled.Merge Eurostat-derived mapping with
COUNTRIES_SPECIAL_CASESso model-specific pseudo-regions continue to resolve.Add explicit precedence tests ensuring special-case keys override Eurostat values when overlaps exist.
Phase 5 - Test suite renewal (mapping-focused)
Replace and expand
/home/maxnutz/Documents/pypsa-at-validation/pypsa-validation_processing/tests/test_utils.pyto validate:REGION_MAPPINGis a dict with string keys/values.Extend integration behavior tests in
/home/maxnutz/Documents/pypsa-at-validation/pypsa-validation_processing/tests/test_network_processor.pyand/home/maxnutz/Documents/pypsa-at-validation/pypsa-validation_processing/tests/test_aggregation.pyto verify mapped location names flow through whenmap_country_codes_to_names=Trueand pass-through when disabled.Ensure tests that cover
statistics_functions.pycontinue to satisfy repository rule: outputs must bepandas.Serieswith a MultiIndex containing at leastcountryandunit.Phase 6 - Validation and regression checks
Run focused tests first:
pixi run test tests/test_utils.pypixi run test tests/test_network_processor.pypixi run test tests/test_aggregation.pypixi run testpixi run workflowwith an existing baseline setup, confirming no regression in location-name mapping output columns.Relevant files
/home/maxnutz/Documents/pypsa-at-validation/pypsa-validation_processing/pixi.toml- addeurostat, resolve environment constraints./home/maxnutz/Documents/pypsa-at-validation/pypsa-validation_processing/pypsa_validation_processing/utils.py- replace hard-coded static mapping construction with Eurostat-backed composition + fallback./home/maxnutz/Documents/pypsa-at-validation/pypsa-validation_processing/pypsa_validation_processing/class_definitions.py- verify existingREGION_MAPPINGconsumption contract remains unchanged./home/maxnutz/Documents/pypsa-at-validation/pypsa-validation_processing/tests/test_utils.py- fully renew mapping tests./home/maxnutz/Documents/pypsa-at-validation/pypsa-validation_processing/tests/test_network_processor.py- add mapping behavior integration assertions./home/maxnutz/Documents/pypsa-at-validation/pypsa-validation_processing/tests/test_aggregation.py- verify mapped-name behavior under aggregation flows./home/maxnutz/Documents/pypsa-at-validation/pypsa-validation_processing/tests/conftest.py- shared fixtures/mocks for Eurostat responses and fallback scenarios.Verification
eurostatimport works under Pixi environment and lock resolution is stable.Network_Processoroutput only when config flag is enabled.Decisions
REGION_MAPPINGdict) and pass-through behavior for unknown codes.definitions/and arbitrary edits toconfigs/.Further Considerations