Skip to content

Create country-codes and region-codes dynamically #39

@maxnutz

Description

@maxnutz

Current situation

  • the mapping of country-codes and nuts2/nuts3-region-codes is hard coded in utils.py
  • possible changes of nuts2 regions or nuts3 regions are not recognised and must be adapted manually.

Goal

  • use the eurostat-package to create a mapping of countries, nuts2-Regions and nuts3-regions to dynamically adapt for possible changes.
  • this is an example of how one can create the mapping for nuts2 and nuts3 - regions to names:
import re
import eurostat


def get_nuts_mapping(level: int, country_prefix: str | None = None) -> dict[str, str]:
    # Use a regional dataset so Eurostat exposes a geo dictionary with NUTS codes
    dataset = "nama_10r_2gdp" if level == 2 else "nama_10r_3gdp"
    geo_dic = eurostat.get_dic(dataset, par="geo", frmt="dict", lang="en")

    pattern = r"^[A-Z]{2}[A-Z0-9]{2}$" if level == 2 else r"^[A-Z]{2}[A-Z0-9]{3}$"

    out = {
        code: name
        for code, name in geo_dic.items()
        if re.match(pattern, code)
        and (country_prefix is None or code.startswith(country_prefix))
    }
    return dict(sorted(out.items()))


NUTS_2_REGIONS = get_nuts_mapping(level=2, country_prefix="AT")
NUTS_3_REGIONS = get_nuts_mapping(level=3, country_prefix="AT")

Plan: Dynamic Eurostat Region Mapping

Replace hard-coded country/NUTS mappings with a Eurostat-backed mapping pipeline that fetches country + EU-wide NUTS2/NUTS3 names at runtime, merges model-specific overrides, and falls back to deterministic local defaults when Eurostat is unavailable. Keep current consumer behavior intact (REGION_MAPPING.get(code, code)), then fully renew mapping tests to cover structure, precedence, fallback, and integration output behavior.

Steps

  1. Phase 1 - Dependency and environment readiness

  2. Add eurostat to the Pixi environment in /home/maxnutz/Documents/pypsa-at-validation/pypsa-validation_processing/pixi.toml (via pixi add eurostat) and sync lock state.

  3. Resolve any solver conflicts by narrowing only where required (prefer minimal pinning and avoid touching unrelated dependency constraints).

  4. Verify the environment can import eurostat in the Pixi context and that pixi run test still starts.

  5. Phase 2 - Mapping architecture refactor (no behavior break)

  6. Introduce a dedicated mapping module in /home/maxnutz/Documents/pypsa-at-validation/pypsa-validation_processing/pypsa_validation_processing/utils.py (or a new internal helper module if preferred by maintainers) with clear separations:

  • Base country mapping from Eurostat ISO-like country codes.
  • NUTS2 mapping loader.
  • NUTS3 mapping loader.
  • Model-specific override mapping (COUNTRIES_SPECIAL_CASES).
  1. Implement Eurostat loaders as pure functions with explicit signatures, for example:
  • get_country_mapping() -> dict[str, str]
  • get_nuts_mapping(level: int, country_prefix: str | None = None) -> dict[str, str]
  1. Enforce code filtering rules with regex for valid keys:
  • Countries: 2-letter uppercase codes.
  • NUTS2: 4-character NUTS pattern.
  • NUTS3: 5-character NUTS pattern.
  1. Build one comprehensive mapping dict by merging in deterministic precedence order:
  • countries -> NUTS2 -> NUTS3 -> COUNTRIES_SPECIAL_CASES.
  1. Export REGION_MAPPING exactly as current consumers expect (plain dict[str, str]) so Network_Processor.structure_pyam_from_pandas() in /home/maxnutz/Documents/pypsa-at-validation/pypsa-validation_processing/pypsa_validation_processing/class_definitions.py remains unchanged.

  2. Phase 3 - Resilience and fallback behavior

  3. Add robust runtime fallback handling around Eurostat access:

  • Catch network/API/data-shape exceptions.
  • Return last known local/static mapping if fetching fails.
  • Keep unknown codes pass-through behavior intact downstream.
  1. Add lightweight caching policy decision in implementation notes:
  • v1 default: in-process fetch with fallback only (no persisted cache), unless CI/runtime cost proves high.
  1. Keep name language policy explicit: use Eurostat English names as default canonical names.

  2. Phase 4 - REGION_MAPPING composition with special cases

  3. Keep /home/maxnutz/Documents/pypsa-at-validation/pypsa-validation_processing/pypsa_validation_processing/utils.py as the integration surface where REGION_MAPPING is assembled.

  4. Merge Eurostat-derived mapping with COUNTRIES_SPECIAL_CASES so model-specific pseudo-regions continue to resolve.

  5. Add explicit precedence tests ensuring special-case keys override Eurostat values when overlaps exist.

  6. Phase 5 - Test suite renewal (mapping-focused)

  7. Replace and expand /home/maxnutz/Documents/pypsa-at-validation/pypsa-validation_processing/tests/test_utils.py to validate:

  • REGION_MAPPING is a dict with string keys/values.
  • EU country coverage includes required known codes.
  • NUTS2/NUTS3 entries are present and correctly shaped.
  • Merge precedence for overlapping keys.
  1. Add Eurostat integration unit tests with mocking (no live network in CI):
  • Success path: mocked Eurostat dictionaries produce expected merged output.
  • Failure path: Eurostat call errors trigger fallback mapping.
  • Partial-data path: missing datasets still produce valid mapping with fallback.
  1. Extend integration behavior tests in /home/maxnutz/Documents/pypsa-at-validation/pypsa-validation_processing/tests/test_network_processor.py and /home/maxnutz/Documents/pypsa-at-validation/pypsa-validation_processing/tests/test_aggregation.py to verify mapped location names flow through when map_country_codes_to_names=True and pass-through when disabled.

  2. Ensure tests that cover statistics_functions.py continue to satisfy repository rule: outputs must be pandas.Series with a MultiIndex containing at least country and unit.

  3. Phase 6 - Validation and regression checks

  4. Run focused tests first:

  • pixi run test tests/test_utils.py
  • pixi run test tests/test_network_processor.py
  • pixi run test tests/test_aggregation.py
  1. Run full suite:
  • pixi run test
  1. Run workflow smoke test for one existing config (no config edits required):
  • pixi run workflow with an existing baseline setup, confirming no regression in location-name mapping output columns.

Relevant files

  • /home/maxnutz/Documents/pypsa-at-validation/pypsa-validation_processing/pixi.toml - add eurostat, resolve environment constraints.
  • /home/maxnutz/Documents/pypsa-at-validation/pypsa-validation_processing/pypsa_validation_processing/utils.py - replace hard-coded static mapping construction with Eurostat-backed composition + fallback.
  • /home/maxnutz/Documents/pypsa-at-validation/pypsa-validation_processing/pypsa_validation_processing/class_definitions.py - verify existing REGION_MAPPING consumption contract remains unchanged.
  • /home/maxnutz/Documents/pypsa-at-validation/pypsa-validation_processing/tests/test_utils.py - fully renew mapping tests.
  • /home/maxnutz/Documents/pypsa-at-validation/pypsa-validation_processing/tests/test_network_processor.py - add mapping behavior integration assertions.
  • /home/maxnutz/Documents/pypsa-at-validation/pypsa-validation_processing/tests/test_aggregation.py - verify mapped-name behavior under aggregation flows.
  • /home/maxnutz/Documents/pypsa-at-validation/pypsa-validation_processing/tests/conftest.py - shared fixtures/mocks for Eurostat responses and fallback scenarios.

Verification

  1. Dependency verification: confirm eurostat import works under Pixi environment and lock resolution is stable.
  2. Unit verification: mocked Eurostat success/failure tests pass without network.
  3. Integration verification: mapping is applied in Network_Processor output only when config flag is enabled.
  4. Regression verification: full test suite passes with no changes to forbidden config/definitions areas.
  5. Functional verification: representative workflow output contains readable region names for country + NUTS2 + NUTS3 codes and still resolves model-specific pseudo-regions.

Decisions

  • Runtime strategy: fetch from Eurostat at runtime with graceful fallback.
  • Scope: include all country names and EU-wide NUTS2/NUTS3 mappings (not AT-only).
  • Naming policy: use Eurostat English names as canonical output labels.
  • Preserve compatibility: keep existing consumer interface (REGION_MAPPING dict) and pass-through behavior for unknown codes.
  • Out of scope: changes to definitions/ and arbitrary edits to configs/.

Further Considerations

  1. Performance guardrail recommendation: if runtime fetch latency becomes noticeable, add optional persisted cache in a follow-up ticket.
  2. Reproducibility recommendation: optionally pin Eurostat data snapshot/version in a later enhancement if strict run-to-run name stability is required.
  3. Observability recommendation: add one concise warning log when fallback is used, to make data-source mode visible during workflow runs.

Metadata

Metadata

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions