Skip to content

New variable statistics final energy by carrier oil#51

Open
maxnutz wants to merge 9 commits intomainfrom
46-new-variable-statistics-final-energy-by-carrieroil
Open

New variable statistics final energy by carrier oil#51
maxnutz wants to merge 9 commits intomainfrom
46-new-variable-statistics-final-energy-by-carrieroil

Conversation

@maxnutz
Copy link
Copy Markdown
Owner

@maxnutz maxnutz commented Apr 23, 2026

Add variable function for IAMC-variable Final Energy [by Carrier]|Oil

Open To Dos:

  • add a comprehensive docstring to your function
  • add renewable-oil fraction to statistics-output
  • Add a testing routine for your Function to tests/ - stick to the testing-README
  • make sure, that the newest version of main is merged into your feature Branch

Validation

variable PyPSA-output IAMC Energy Balance 🟢 🟡 🟠 🔴
Final Energy [by Carrier]|Oil 343701 354445 🟡 excludes naphtha for industry
  • energy balance evaluation does NOT include biofuel parts for oil
  • calculation of renewable-oil-fraction includes naphtha-for-industry as carrier.

Concerning Naphtha for industry

  • in our IAMC-variable definition, energy consumption is directly included, excluding non-energy-demands.
  • energy balance FE_E does NOT include naphtha for industrial processes in Final Energy (there is an extra non-energy-use accounting)
  • in pypsa-eur, demand from naphtha for industry is integrated from JRC-IDEES-Database, including Naphtha as Fuel for "Chemical industry > Basic chemicals" and in their report, they include following statement concerning that subsector:

    Process-related non-energy use, which is modelled for the Basic chemicals subsector to
    distinguish the consumption of chemical feedstocks, given that this consumption may be
    subject to significantly different dynamics than energy use.

  • and, Naphtha is an assimilation of different fuel oils and other liquids used in chemical industry (see function chemicals_industry() in build_industry_sector_ratios.py)
  • for the fraction factor fossil-oil - non-fossil-oil, naphtha is included, as it covers part of the demand, non-fossil-oil production is used for.

Concerning fraction of non-fossil oil

Calculating the fraction of non-fossil oil is a first-guess code to likely be refactored in the future. For excluding non-fossil oil parts from statistics, a fraction of fossil-oil to non-fossil-oil is calculated using 2 meta-variables:

  • demand of oil per region
  • production of renewable oil per region

mixing demand and supply variables. Renewable oil flows are not included in this statistics and so, for regions with low oil demand and high production rates of renewable oil, this can lead to a renewable-oil-factor greater 1. Currently, these factors are clipped to 1, additional renewable oil production is therefore "lost".

Problem: renewable oil production exceeding the oil consumption of the respective region is lost, statistics are not fully consistent. This applies especially for country-wise aggregation, where renewable-oil-fraction is therefore underestimated. Problem will be covered in Issue #53


Summary by Sourcery

Add support for computing IAMC variable 'Final Energy [by Carrier]|Oil' and map it into the default statistics configuration.

New Features:

  • Introduce Final_Energy_by_Carrier__Oil statistics function aggregating oil-based final energy demand across sectors.
  • Expose the new oil final energy variable in the default IAMC mapping configuration.

Enhancements:

  • Add utility helper to remap copperplate-based location indices to regional location labels in statistics outputs.

Summary by Sourcery

Add statistics support for fossil final energy oil demand and ensure it is robust and region-aware.

New Features:

  • Introduce Final_Energy_by_Carrier__Oil statistics function to compute fossil final-energy oil demand by region with renewable share adjustment.

Enhancements:

  • Add create_location_index_from_cupperplate utility to remap copperplate-based results to regional location indices.
  • Extend unit mapping to cover plain MWh entries.

Tests:

  • Add a dedicated test suite for Final_Energy_by_Carrier__Oil covering edge cases such as empty demand, zero denominators, and timeseries output.

@maxnutz maxnutz linked an issue Apr 23, 2026 that may be closed by this pull request
7 tasks
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented Apr 23, 2026

Reviewer's Guide

Implements a new statistics function Final_Energy_by_Carrier__Oil that computes fossil final-energy oil demand by combining sectoral oil loads with a regionalized renewable-oil share, adds a helper to remap copperplate locations to regional labels, extends unit mapping to support MWh, and wires the new oil variable into the default IAMC mapping; accompanied by targeted tests that validate edge cases and timeseries behavior.

Class diagram for new statistics and utils functions

classDiagram
    class StatisticsFunctionsModule {
        +Final_Energy_by_Carrier__Oil(n, aggregate_per_year)
    }

    class UtilsModule {
        +create_location_index_from_cupperplate(raw_input, usage_location_list)
        +UNITS_MAPPING
    }

    class PypsaNetwork {
        +statistics.withdrawal(...)
        +statistics.supply(...)
    }

    StatisticsFunctionsModule ..> UtilsModule : uses
    StatisticsFunctionsModule ..> PypsaNetwork : uses
    UtilsModule ..> PypsaNetwork : index location based on bus names
Loading

File-Level Changes

Change Details Files
Add Final_Energy_by_Carrier__Oil statistics function to compute fossil final-energy oil demand by region.
  • Implement aggregation of agriculture, residential/commercial, and land-transport oil withdrawals into a total regional oil final-energy metric using statistics accessors.
  • Regionalize copperplate-based residential/commercial oil loads by deriving region codes from bus names, remapping the location index, and regrouping to the configured groupby index.
  • Compute regional renewable-oil parts from selected renewable supply links, derive a renewable-oil fraction using total oil withdrawals as denominator, clip the fraction to 1, and apply it to obtain fossil-only oil final energy.
  • Reuse UNITS_MAPPING to normalize units, regroup results by configured groupby levels, and support both yearly aggregates and snapshot-resolved outputs via aggregate_per_year.
pypsa_validation_processing/statistics_functions.py
Introduce a reusable helper to rewrite the location level of MultiIndex results based on copperplate bus-derived usage locations.
  • Add create_location_index_from_cupperplate utility that reconstructs a MultiIndex from its index frame, replaces the location column with supplied usage locations, and reapplies it to Series/DataFrame indices.
  • Document the helper’s behavior, constraints, and error conditions in a detailed docstring.
  • Extend UNITS_MAPPING to include an explicit MWh-to-MWh identity mapping to support unit normalization in new statistics functions.
pypsa_validation_processing/utils.py
Add targeted tests for Final_Energy_by_Carrier__Oil covering regionalization, clipping, and edge-case handling.
  • Create a minimal _OilNetwork and _OilStatisticsAccessor test harness that returns deterministic withdrawal and supply results for agriculture, transport, and residential/commercial oil uses as well as renewable oil production and total oil denominator.
  • Test that excessive renewable oil production clips the non-fossil share to 1, yielding zero fossil oil, and that empty residential/commercial demand does not cause failures.
  • Verify safe handling of zero total oil demand denominators and correct return types/index structures for both aggregated (Series) and non-aggregated (DataFrame timeseries) modes.
tests/test_statistics_functions.py
Expose the new Final Energy [by Carrier] Oil IAMC variable in the default mapping configuration.
  • Wire the Final_Energy_by_Carrier__Oil statistics function into the default IAMC mapping so that the newly computed oil final energy metric appears in statistics outputs alongside existing variables.

Assessment against linked issues

Issue Objective Addressed Explanation
#46 Implement a new statistics function in statistics_functions.py that computes the IAMC variable `Final Energy [by Carrier] Oil`, following the project’s naming and structural conventions and including appropriate logic (e.g. renewable oil fraction handling).
#46 Expose the new variable by adding the mapping from `Final Energy [by Carrier] Oil` to the new function name in configs/mapping.default.yaml (and corresponding mapping files).
#46 Add an automated test routine under tests/ that covers the new Final_Energy_by_Carrier__Oil function in line with the testing README.

Possibly linked issues

  • #(not specified): The PR implements the requested Final Energy [by Carrier]|Oil statistics function, mapping, helper, and corresponding tests from the issue.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@maxnutz maxnutz marked this pull request as ready for review April 27, 2026 09:31
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 5 issues, and left some high level feedback:

  • In Final_Energy_by_Carrier__Oil, kwargs and kwargs_filtering are used but never defined or passed into the function, which will fail at runtime unless they are globals; consider making them explicit parameters or deriving the needed groupby/filters locally.
  • In create_location_index_from_cupperplate, the standalone idx_df line has no effect and can be removed to avoid confusion and keep the helper minimal.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `Final_Energy_by_Carrier__Oil`, `kwargs` and `kwargs_filtering` are used but never defined or passed into the function, which will fail at runtime unless they are globals; consider making them explicit parameters or deriving the needed `groupby`/filters locally.
- In `create_location_index_from_cupperplate`, the standalone `idx_df` line has no effect and can be removed to avoid confusion and keep the helper minimal.

## Individual Comments

### Comment 1
<location path="pypsa_validation_processing/statistics_functions.py" line_range="239-240" />
<code_context>
+
+    total = pd.concat(series_list).groupby(kwargs["groupby"]).sum()
+
+    # non-fossil parts from renewable-gas production per location
+    # renewable oil production
+    non_fossil_parts = n.statistics.supply(
</code_context>
<issue_to_address>
**suggestion (typo):** Adjust the comment to refer to renewable oil instead of gas

This still refers to "renewable-gas" even though the logic handles renewable oil. Please update the wording (e.g. "renewable-oil production per location") to keep terminology consistent and avoid confusion.

```suggestion
    # non-fossil parts from renewable-oil production per location
    # renewable oil production
```
</issue_to_address>

### Comment 2
<location path="pypsa_validation_processing/statistics_functions.py" line_range="290" />
<code_context>
+    all_oil = create_location_index_from_cupperplate(all_oil, home_location)
+    all_oil = all_oil.groupby(kwargs["groupby"]).sum()
+
+    non_fossil_fraction = non_fossil_parts / all_oil
+    non_fossil_fraction = non_fossil_fraction.clip(upper=1)  # TODO: Issue #53
+    non_fossil_fraction = non_fossil_fraction.rename(index=UNITS_MAPPING)
</code_context>
<issue_to_address>
**issue:** Guard against empty or zero `all_oil` when computing the non-fossil fraction

If `all_oil` is zero/empty for some locations, this division will produce `NaN`/`inf` in `non_fossil_fraction`. These can survive `clip`/`groupby().mean()` and skew fossil shares in cases like regions with only renewables or no oil demand. Please handle this explicitly, e.g. by dropping locations with `all_oil == 0` before division or normalizing those results to a defined fallback (such as 0).
</issue_to_address>

### Comment 3
<location path="pypsa_validation_processing/utils.py" line_range="210-212" />
<code_context>
+        If ``usage_location_list`` length does not match the number of rows, or
+        if the index cannot be reconstructed with the existing index names.
+    """
+    idx_df = raw_input.index.to_frame(index=False)
+    idx_df["location"] = pd.Index(usage_location_list).to_numpy()
+    idx_df
+    new_index = pd.MultiIndex.from_frame(idx_df, names=raw_input.index.names)
+    output = raw_input.copy()
</code_context>
<issue_to_address>
**issue:** Remove the stray `idx_df` expression or replace it with a useful check

This line is a no-op and will be reported as dead code. If it was only for debugging, please remove it; if you intended a check, replace it with an explicit validation (e.g., checking shape or columns) so the intent is clear and enforced.
</issue_to_address>

### Comment 4
<location path="tests/test_statistics_functions.py" line_range="228" />
<code_context>
+# ---------------------------------------------------------------------------
+
+
+class TestFinalEnergyByCarrierOil:
+    """Test suite for Final_Energy_by_Carrier__Oil function."""
+
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test case where there is no renewable oil production to ensure fossil oil equals total oil demand.

Currently `_OilStatisticsAccessor.supply` always returns a positive renewable-oil flow, so tests only cover the `non_fossil_fraction > 0` path. Please add a case where `non_fossil_parts` is empty (e.g. by toggling an `_OilNetwork` flag or making `supply` return an empty Series/DataFrame) and assert that `Final_Energy_by_Carrier__Oil` returns the unadjusted total oil demand (fossil share equals total). This will catch regressions where an empty `non_fossil_parts` could yield NaNs or zero fossil demand.

Suggested implementation:

```python
# ---------------------------------------------------------------------------
# Tests for Final_Energy_by_Carrier__Oil
# ---------------------------------------------------------------------------


class TestFinalEnergyByCarrierOil:
    """Test suite for Final_Energy_by_Carrier__Oil function."""

    def test_no_renewable_oil_production_fossil_equals_total(
        self,
        oil_statistics_accessor,
        oil_network,
        monkeypatch,
    ):
        """
        When there is no renewable oil production (non_fossil_parts is empty),
        the fossil share should equal the total oil demand and no NaNs should appear.
        """

        import pandas as pd

        # Force the accessor to return an empty non-fossil (renewable) supply
        # so that non_fossil_parts becomes empty inside Final_Energy_by_Carrier__Oil.
        def empty_supply(self, *args, **kwargs):
            # Returning an empty Series/DataFrame ensures non_fossil_fraction == 0
            return pd.Series(dtype=float)

        # Patch the supply method used by Final_Energy_by_Carrier__Oil
        monkeypatch.setattr(type(oil_statistics_accessor), "supply", empty_supply)

        # Act
        result = Final_Energy_by_Carrier__Oil(oil_statistics_accessor, oil_network)

        # Assert that fossil oil equals total oil demand when there is no renewable share.
        # This prevents regressions where an empty non_fossil_parts would yield NaNs or zeros.
        fossil = result["fossil"]
        total = result["total"]

        # Fossil share should be identical to total demand
        assert (fossil == total).all()
        # No NaNs should be present in fossil or total
        assert not fossil.isna().any()
        assert not total.isna().any()

```

1. Ensure this file defines or imports fixtures named `oil_statistics_accessor` and `oil_network` (or adjust the test parameters to match the existing fixtures used for `Final_Energy_by_Carrier__Oil` tests).
2. Adjust the `result["fossil"]` and `result["total"]` accessors to match the actual structure returned by `Final_Energy_by_Carrier__Oil` (e.g. if it returns a DataFrame with different column names or a Series with different indices, update the keys accordingly).
3. If `pandas` (`pd`) is already imported at the top of the file, remove the inline `import pandas as pd` from the test and rely on the shared import instead.
4. If `Final_Energy_by_Carrier__Oil` has a different call signature (e.g. only takes a statistics accessor or a network), update the call in the test accordingly.
5. If `_OilStatisticsAccessor.supply` is a module-level function or located elsewhere, adjust the `monkeypatch.setattr` target (e.g. patch the module-level function or the concrete accessor class actually used in the existing tests).
</issue_to_address>

### Comment 5
<location path="tests/test_statistics_functions.py" line_range="420-408" />
<code_context>
+        # non-fossil fraction is clipped to 1, so fossil share remains zero.
+        assert result.loc[("AT1", "MWh")] == pytest.approx(0.0)
+
+    def test_handles_zero_total_oil_demand_denominator(self):
+        """Division by zero in non-fossil share denominator should not crash."""
+        result = Final_Energy_by_Carrier__Oil(self._OilNetwork(all_oil_value=0.0))
+
+        assert isinstance(result, pd.Series)
+        assert result.loc[("AT1", "MWh")] == pytest.approx(0.0)
+
+    def test_returns_dataframe_for_aggregate_per_year_false(self):
</code_context>
<issue_to_address>
**suggestion (testing):** Extend coverage to a case where the "all_oil" withdrawal result is completely empty, not just zero-valued.

The current test covers the zero-denominator case via `all_oil_value=0.0`, but there’s another branch where `all_oil` can be an empty Series/DataFrame (e.g. no oil-using components). Please add a test where `_OilStatisticsAccessor.withdrawal` returns an empty result for `all_oil` and confirm that `Final_Energy_by_Carrier__Oil` still returns a valid (likely all-zero) Series/DataFrame with a well-defined index and no errors or NaNs.

Suggested implementation:

```python
    def test_handles_empty_rescom_without_failing(self):
        """Function should work even when residential/commercial oil demand is empty."""
        result = Final_Energy_by_Carrier__Oil(self._OilNetwork(rescom_empty=True))

        assert isinstance(result, pd.Series)
        assert isinstance(result.index, pd.MultiIndex)
        assert result.index.names == ["location", "unit"]
        # non-fossil fraction is clipped to 1, so fossil share remains zero.
        assert result.loc[("AT1", "MWh")] == pytest.approx(0.0)

    def test_handles_zero_total_oil_demand_denominator(self):
        """Division by zero in non-fossil share denominator should not crash."""
        result = Final_Energy_by_Carrier__Oil(self._OilNetwork(all_oil_value=0.0))

        assert isinstance(result, pd.Series)
        assert result.loc[("AT1", "MWh")] == pytest.approx(0.0)

    def test_handles_empty_all_oil_without_failing(self):
        """Function should work when total oil withdrawal result is completely empty.

        This simulates a situation where there are no oil-using components, so the
        `_OilStatisticsAccessor.withdrawal` call returns an empty Series/DataFrame.
        """
        result = Final_Energy_by_Carrier__Oil(self._OilNetwork(all_oil_empty=True))

        assert isinstance(result, pd.Series)
        # We still expect a well-defined MultiIndex on the output
        assert isinstance(result.index, pd.MultiIndex)
        assert result.index.names == ["location", "unit"]
        # Result should not contain NaNs and should be effectively all-zero
        assert not result.isna().any()
        assert (result == 0.0).all()

    def test_returns_dataframe_for_aggregate_per_year_false(self):
        """Function should return a timeseries DataFrame for aggregate_per_year=False."""
        result = Final_Energy_by_Carrier__Oil(
            self._OilNetwork(),
            aggregate_per_year=False,
        )

        assert isinstance(result, pd.DataFrame)
        assert isinstance(result.index, pd.MultiIndex)
        assert result.index.names == ["location", "unit"]
        assert isinstance(result.columns, pd.DatetimeIndex)

```

To make this test pass, you will need to extend the `_OilNetwork` helper used in this test module so that when called with `all_oil_empty=True` it constructs a network (or statistics accessor) where `_OilStatisticsAccessor.withdrawal` returns an empty result for `all_oil` (e.g. an empty `pd.Series` with the appropriate MultiIndex and unit level, or an empty `pd.DataFrame` for the time series case). Ensure that this flag is mutually compatible with the existing `all_oil_value` parameter and that the empty result flows into `Final_Energy_by_Carrier__Oil` in the same way as the non-empty case.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread pypsa_validation_processing/statistics_functions.py Outdated
Comment thread pypsa_validation_processing/statistics_functions.py Outdated
Comment thread pypsa_validation_processing/utils.py Outdated
Comment thread tests/test_statistics_functions.py
Comment thread tests/test_statistics_functions.py
maxnutz and others added 4 commits April 27, 2026 11:45
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
…of github.com:maxnutz/pypsa_validation_processing into 46-new-variable-statistics-final-energy-by-carrieroil
@maxnutz maxnutz self-assigned this Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

New Variable Statistics: Final Energy [by Carrier]|Oil

1 participant