New variable statistics final energy by carrier oil#51
Open
Conversation
7 tasks
Contributor
Reviewer's GuideImplements a new statistics function Final_Energy_by_Carrier__Oil that computes fossil final-energy oil demand by combining sectoral oil loads with a regionalized renewable-oil share, adds a helper to remap copperplate locations to regional labels, extends unit mapping to support MWh, and wires the new oil variable into the default IAMC mapping; accompanied by targeted tests that validate edge cases and timeseries behavior. Class diagram for new statistics and utils functionsclassDiagram
class StatisticsFunctionsModule {
+Final_Energy_by_Carrier__Oil(n, aggregate_per_year)
}
class UtilsModule {
+create_location_index_from_cupperplate(raw_input, usage_location_list)
+UNITS_MAPPING
}
class PypsaNetwork {
+statistics.withdrawal(...)
+statistics.supply(...)
}
StatisticsFunctionsModule ..> UtilsModule : uses
StatisticsFunctionsModule ..> PypsaNetwork : uses
UtilsModule ..> PypsaNetwork : index location based on bus names
File-Level Changes
Assessment against linked issues
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Contributor
There was a problem hiding this comment.
Hey - I've found 5 issues, and left some high level feedback:
- In
Final_Energy_by_Carrier__Oil,kwargsandkwargs_filteringare used but never defined or passed into the function, which will fail at runtime unless they are globals; consider making them explicit parameters or deriving the neededgroupby/filters locally. - In
create_location_index_from_cupperplate, the standaloneidx_dfline has no effect and can be removed to avoid confusion and keep the helper minimal.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `Final_Energy_by_Carrier__Oil`, `kwargs` and `kwargs_filtering` are used but never defined or passed into the function, which will fail at runtime unless they are globals; consider making them explicit parameters or deriving the needed `groupby`/filters locally.
- In `create_location_index_from_cupperplate`, the standalone `idx_df` line has no effect and can be removed to avoid confusion and keep the helper minimal.
## Individual Comments
### Comment 1
<location path="pypsa_validation_processing/statistics_functions.py" line_range="239-240" />
<code_context>
+
+ total = pd.concat(series_list).groupby(kwargs["groupby"]).sum()
+
+ # non-fossil parts from renewable-gas production per location
+ # renewable oil production
+ non_fossil_parts = n.statistics.supply(
</code_context>
<issue_to_address>
**suggestion (typo):** Adjust the comment to refer to renewable oil instead of gas
This still refers to "renewable-gas" even though the logic handles renewable oil. Please update the wording (e.g. "renewable-oil production per location") to keep terminology consistent and avoid confusion.
```suggestion
# non-fossil parts from renewable-oil production per location
# renewable oil production
```
</issue_to_address>
### Comment 2
<location path="pypsa_validation_processing/statistics_functions.py" line_range="290" />
<code_context>
+ all_oil = create_location_index_from_cupperplate(all_oil, home_location)
+ all_oil = all_oil.groupby(kwargs["groupby"]).sum()
+
+ non_fossil_fraction = non_fossil_parts / all_oil
+ non_fossil_fraction = non_fossil_fraction.clip(upper=1) # TODO: Issue #53
+ non_fossil_fraction = non_fossil_fraction.rename(index=UNITS_MAPPING)
</code_context>
<issue_to_address>
**issue:** Guard against empty or zero `all_oil` when computing the non-fossil fraction
If `all_oil` is zero/empty for some locations, this division will produce `NaN`/`inf` in `non_fossil_fraction`. These can survive `clip`/`groupby().mean()` and skew fossil shares in cases like regions with only renewables or no oil demand. Please handle this explicitly, e.g. by dropping locations with `all_oil == 0` before division or normalizing those results to a defined fallback (such as 0).
</issue_to_address>
### Comment 3
<location path="pypsa_validation_processing/utils.py" line_range="210-212" />
<code_context>
+ If ``usage_location_list`` length does not match the number of rows, or
+ if the index cannot be reconstructed with the existing index names.
+ """
+ idx_df = raw_input.index.to_frame(index=False)
+ idx_df["location"] = pd.Index(usage_location_list).to_numpy()
+ idx_df
+ new_index = pd.MultiIndex.from_frame(idx_df, names=raw_input.index.names)
+ output = raw_input.copy()
</code_context>
<issue_to_address>
**issue:** Remove the stray `idx_df` expression or replace it with a useful check
This line is a no-op and will be reported as dead code. If it was only for debugging, please remove it; if you intended a check, replace it with an explicit validation (e.g., checking shape or columns) so the intent is clear and enforced.
</issue_to_address>
### Comment 4
<location path="tests/test_statistics_functions.py" line_range="228" />
<code_context>
+# ---------------------------------------------------------------------------
+
+
+class TestFinalEnergyByCarrierOil:
+ """Test suite for Final_Energy_by_Carrier__Oil function."""
+
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test case where there is no renewable oil production to ensure fossil oil equals total oil demand.
Currently `_OilStatisticsAccessor.supply` always returns a positive renewable-oil flow, so tests only cover the `non_fossil_fraction > 0` path. Please add a case where `non_fossil_parts` is empty (e.g. by toggling an `_OilNetwork` flag or making `supply` return an empty Series/DataFrame) and assert that `Final_Energy_by_Carrier__Oil` returns the unadjusted total oil demand (fossil share equals total). This will catch regressions where an empty `non_fossil_parts` could yield NaNs or zero fossil demand.
Suggested implementation:
```python
# ---------------------------------------------------------------------------
# Tests for Final_Energy_by_Carrier__Oil
# ---------------------------------------------------------------------------
class TestFinalEnergyByCarrierOil:
"""Test suite for Final_Energy_by_Carrier__Oil function."""
def test_no_renewable_oil_production_fossil_equals_total(
self,
oil_statistics_accessor,
oil_network,
monkeypatch,
):
"""
When there is no renewable oil production (non_fossil_parts is empty),
the fossil share should equal the total oil demand and no NaNs should appear.
"""
import pandas as pd
# Force the accessor to return an empty non-fossil (renewable) supply
# so that non_fossil_parts becomes empty inside Final_Energy_by_Carrier__Oil.
def empty_supply(self, *args, **kwargs):
# Returning an empty Series/DataFrame ensures non_fossil_fraction == 0
return pd.Series(dtype=float)
# Patch the supply method used by Final_Energy_by_Carrier__Oil
monkeypatch.setattr(type(oil_statistics_accessor), "supply", empty_supply)
# Act
result = Final_Energy_by_Carrier__Oil(oil_statistics_accessor, oil_network)
# Assert that fossil oil equals total oil demand when there is no renewable share.
# This prevents regressions where an empty non_fossil_parts would yield NaNs or zeros.
fossil = result["fossil"]
total = result["total"]
# Fossil share should be identical to total demand
assert (fossil == total).all()
# No NaNs should be present in fossil or total
assert not fossil.isna().any()
assert not total.isna().any()
```
1. Ensure this file defines or imports fixtures named `oil_statistics_accessor` and `oil_network` (or adjust the test parameters to match the existing fixtures used for `Final_Energy_by_Carrier__Oil` tests).
2. Adjust the `result["fossil"]` and `result["total"]` accessors to match the actual structure returned by `Final_Energy_by_Carrier__Oil` (e.g. if it returns a DataFrame with different column names or a Series with different indices, update the keys accordingly).
3. If `pandas` (`pd`) is already imported at the top of the file, remove the inline `import pandas as pd` from the test and rely on the shared import instead.
4. If `Final_Energy_by_Carrier__Oil` has a different call signature (e.g. only takes a statistics accessor or a network), update the call in the test accordingly.
5. If `_OilStatisticsAccessor.supply` is a module-level function or located elsewhere, adjust the `monkeypatch.setattr` target (e.g. patch the module-level function or the concrete accessor class actually used in the existing tests).
</issue_to_address>
### Comment 5
<location path="tests/test_statistics_functions.py" line_range="420-408" />
<code_context>
+ # non-fossil fraction is clipped to 1, so fossil share remains zero.
+ assert result.loc[("AT1", "MWh")] == pytest.approx(0.0)
+
+ def test_handles_zero_total_oil_demand_denominator(self):
+ """Division by zero in non-fossil share denominator should not crash."""
+ result = Final_Energy_by_Carrier__Oil(self._OilNetwork(all_oil_value=0.0))
+
+ assert isinstance(result, pd.Series)
+ assert result.loc[("AT1", "MWh")] == pytest.approx(0.0)
+
+ def test_returns_dataframe_for_aggregate_per_year_false(self):
</code_context>
<issue_to_address>
**suggestion (testing):** Extend coverage to a case where the "all_oil" withdrawal result is completely empty, not just zero-valued.
The current test covers the zero-denominator case via `all_oil_value=0.0`, but there’s another branch where `all_oil` can be an empty Series/DataFrame (e.g. no oil-using components). Please add a test where `_OilStatisticsAccessor.withdrawal` returns an empty result for `all_oil` and confirm that `Final_Energy_by_Carrier__Oil` still returns a valid (likely all-zero) Series/DataFrame with a well-defined index and no errors or NaNs.
Suggested implementation:
```python
def test_handles_empty_rescom_without_failing(self):
"""Function should work even when residential/commercial oil demand is empty."""
result = Final_Energy_by_Carrier__Oil(self._OilNetwork(rescom_empty=True))
assert isinstance(result, pd.Series)
assert isinstance(result.index, pd.MultiIndex)
assert result.index.names == ["location", "unit"]
# non-fossil fraction is clipped to 1, so fossil share remains zero.
assert result.loc[("AT1", "MWh")] == pytest.approx(0.0)
def test_handles_zero_total_oil_demand_denominator(self):
"""Division by zero in non-fossil share denominator should not crash."""
result = Final_Energy_by_Carrier__Oil(self._OilNetwork(all_oil_value=0.0))
assert isinstance(result, pd.Series)
assert result.loc[("AT1", "MWh")] == pytest.approx(0.0)
def test_handles_empty_all_oil_without_failing(self):
"""Function should work when total oil withdrawal result is completely empty.
This simulates a situation where there are no oil-using components, so the
`_OilStatisticsAccessor.withdrawal` call returns an empty Series/DataFrame.
"""
result = Final_Energy_by_Carrier__Oil(self._OilNetwork(all_oil_empty=True))
assert isinstance(result, pd.Series)
# We still expect a well-defined MultiIndex on the output
assert isinstance(result.index, pd.MultiIndex)
assert result.index.names == ["location", "unit"]
# Result should not contain NaNs and should be effectively all-zero
assert not result.isna().any()
assert (result == 0.0).all()
def test_returns_dataframe_for_aggregate_per_year_false(self):
"""Function should return a timeseries DataFrame for aggregate_per_year=False."""
result = Final_Energy_by_Carrier__Oil(
self._OilNetwork(),
aggregate_per_year=False,
)
assert isinstance(result, pd.DataFrame)
assert isinstance(result.index, pd.MultiIndex)
assert result.index.names == ["location", "unit"]
assert isinstance(result.columns, pd.DatetimeIndex)
```
To make this test pass, you will need to extend the `_OilNetwork` helper used in this test module so that when called with `all_oil_empty=True` it constructs a network (or statistics accessor) where `_OilStatisticsAccessor.withdrawal` returns an empty result for `all_oil` (e.g. an empty `pd.Series` with the appropriate MultiIndex and unit level, or an empty `pd.DataFrame` for the time series case). Ensure that this flag is mutually compatible with the existing `all_oil_value` parameter and that the empty result flows into `Final_Energy_by_Carrier__Oil` in the same way as the non-empty case.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
…of github.com:maxnutz/pypsa_validation_processing into 46-new-variable-statistics-final-energy-by-carrieroil
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add variable function for IAMC-variable Final Energy [by Carrier]|Oil
Open To Dos:
tests/- stick to the testing-READMEValidation
Concerning Naphtha for industry
chemicals_industry()inbuild_industry_sector_ratios.py)Concerning fraction of non-fossil oil
Calculating the fraction of non-fossil oil is a first-guess code to likely be refactored in the future. For excluding non-fossil oil parts from statistics, a fraction of fossil-oil to non-fossil-oil is calculated using 2 meta-variables:
mixing demand and supply variables. Renewable oil flows are not included in this statistics and so, for regions with low oil demand and high production rates of renewable oil, this can lead to a renewable-oil-factor greater 1. Currently, these factors are clipped to 1, additional renewable oil production is therefore "lost".
Problem: renewable oil production exceeding the oil consumption of the respective region is lost, statistics are not fully consistent. This applies especially for country-wise aggregation, where renewable-oil-fraction is therefore underestimated. Problem will be covered in Issue #53
Summary by Sourcery
Add support for computing IAMC variable 'Final Energy [by Carrier]|Oil' and map it into the default statistics configuration.
New Features:
Enhancements:
Summary by Sourcery
Add statistics support for fossil final energy oil demand and ensure it is robust and region-aware.
New Features:
Enhancements:
Tests: