SIPNET workflow for restarting with events by ashiklom · Pull Request #3919 · PecanProject/pecan

ashiklom · 2026-04-05T20:11:58Z

Prototype of running SIPNET with event files that include changes in crops. A few implementation notes:

This supports PEcAn ensemble inputs (including for events)
event.json files for multiple ensemble members are stored in run$inputs$events$source. SIPNET-specific event.in files are stored in run$inputs$events$path (like other inputs). This is because the current functionality for finding segments to split the runs uses the JSON files. @infotroph 's subset_paths function from an earlier draft has been adapted and modified to subset both path and source paths (as long as they have the same lengths).
This circumvents runModule.start.model.runs and uses a direct execution loop instead. However, the output is PEcAn standard and follows PEcAn configuration conventions, so downstream analyses/workflows should work out of the box.

It's a bit hacky, but I think it does what it's supposed to and should be enough to unblock other CCMMF modeling tasks (@dlebauer).

…rts.R

...but we reset to initial conditions every time.

…kflow

Based on the actual JSON contents.

start jan 1 of first planting year end dec 31 of last harvest year

ashiklom · 2026-04-17T20:08:55Z

OK, I think I've resolved all of @divine7022 and @infotroph comments. I also resolved a significant bug with write_events.SIPNET (namely, the events were always being written out in the same order as events.json, not sorted by date as intended), and did a bit of other miscellaneous cleanup.

Note that this PR sits on top of #3836 and #3828 --- if we merge this, we can probably just close those.

@dlebauer :

Great work! Excited to have this working! 🥳

Before merging, please create a ticket to cover the refactoring plan we discussed last week.

From what I recall, the plan was approximately:

Goal: refactor segmented runs so that they integrate more cleanly with standard start_model_runs workflows.

Specifically:
* write.configs generates config files

* sites and ensembles run in parallel via start_model_runs

* This will require modifying or replacing the top‑level job.sh so that:
  
  * Segments run sequentially within a site, and
  * Sites still run in parallel via start_model_runs.

All of this already works in this PR. The new write_segmented_configs.SIPNET does a bunch of config preparation behind the scenes and then modifies the job.sh files associated with each ensemble member such that, instead of running SIPNET directly, they run the segments in sequence and then post-process the results. This means that the actual interface here is very "PEcAn-ic":

settings <- PEcAn.workflow::runModule.run.write.configs(
  settings_raw,
  input_design = sens_design$X
)

source("workflows/sipnet-restart-workflow/utils.R")
jobfiles <- write_segmented_configs.SIPNET(settings, sens_design$X, force_rerun = TRUE)

PEcAn.workflow::runModule_start_model_runs(settings)

Whatever method the user has specified for running job.sh files in settings will continue to work here without modification.

PecanProject#3919 (comment)

This reverts commit 67e055d.

infotroph

This looks great! I was able to run through 3 ensembled sites in a lightly modified MAGiC ensemble workflow using just utils.R with the changes noted below. I think this is ready to merge. Next iteration can be to decide which package(s) to drop the functions into.

infotroph · 2026-04-18T03:47:51Z

+    cls == "F" ~ "annual_crop",
+    cls == "G" ~ "grass",
+    cls == "P" ~ "grass",
+    cls == "R" ~ "grass",


Yes, this table is still temporary, but needed this one for the sites I grabbed

Suggested change

cls == "R" ~ "grass",

cls == "R" ~ "grass",

cls == "T" ~ "annual_crop",

infotroph · 2026-04-18T03:48:49Z

+  if (!file.exists(manifest_file)) {
+    PEcAn.logger::logger.severe("Could not find manifest file: ", manifest_file)
+  }
+  inputs_runs <- read.csv(manifest_file)


Suggested change

inputs_runs <- read.csv(manifest_file)

inputs_runs <- read.csv(manifest_file) |>

dplyr::filter(.data$site_id == settings$run$site$id) |>

# TODO the manifest should probably report these already...

dplyr::mutate(

ens_num = .data$run_id |>

stringr::str_extract("ENS-(\\d+)", group = 1) |>

as.integer()

)

infotroph · 2026-04-18T03:49:39Z

+  }
+  inputs_runs <- read.csv(manifest_file)
+  if (!is.null(input_design)) {
+    inputs_runs <- cbind.data.frame(inputs_runs, input_design)


In a multi-site run the cbind winds up becoming a cross join. Need to align explicitly by ensemble number

Suggested change

inputs_runs <- cbind.data.frame(inputs_runs, input_design)

inputs_runs <- inputs_runs |>

dplyr::left_join(

input_design |> tibble::rowid_to_column("ens_num"),

by = "ens_num",

relationship = "many-to-one")

infotroph · 2026-04-18T03:51:20Z

+  stopifnot(file.exists(events_json))
+
+  crop_cycles <- PEcAn.data.land::events_to_crop_cycle_starts(events_json) |>
+    dplyr::ungroup()


Suggested change

dplyr::ungroup()

dplyr::filter(.data$site_id == run_settings$run$site$id) |>

dplyr::ungroup()

infotroph · 2026-04-18T03:53:13Z

+      pft = crop2pft(.data$crop_code),
+      segment_dir = file.path(segment_rootdir, sprintf("segment_%s", .data$segment_id))
+    )
+


Seems useful to retain this information for diagnostics. I didn't think carefully about format or location, though -- counterproposals welcome

Suggested change

write.csv(segments, file = file.path(run_dir, "segments.csv"), row.names = FALSE)

infotroph · 2026-04-18T04:00:22Z

+)
+
+source("workflows/sipnet-restart-workflow/utils.R")
+jobfiles <- write_segmented_configs.SIPNET(settings, sens_design$X)


Just leaving a breadcrumb hint for the next person

Suggested change

jobfiles <- write_segmented_configs.SIPNET(settings, sens_design$X)

jobfiles <- write_segmented_configs.SIPNET(settings, sens_design$X)

# Note: If running a multi-site workflow, use:

# jobfiles <- papply(settings, \(s) write_segmented_configs.SIPNET(s, sens_design$X))

dlebauer · 2026-04-18T18:45:48Z

+  settings
+}
+
+# TODO: We need a better, consistent implementation of this. However, this is


I thought this table was with the landiq code. If not, I think @infotroph , @sarahkanee , and I have each implemented one or more versions of this mapping.

infotroph and others added 25 commits March 27, 2026 13:34

first draft of fn to parse restart dates out of event files

9fe0165

Update modules/data.land/tests/testthat/test-events_to_crop_cycle_sta…

4708176

…rts.R

require crop id in planting and harvest; add docs on event properties

d8df7d0

right, we said optional for harvest

24db912

changelog

7d145c8

WIP changes

67e055d

split prepare events from run workflow

dac5a9e

wip changes

175f15a

remove .Renviron

a97a946

sipnet output workaround

4174200

pixi dependencies

236fb4a

crop --> crop_code for parsing crop cycles

b093d75

working run-sipnet workflow

c47817c

add cdo and r-stars to pixi environment

4bf352f

workflow runs with outputs concatenated

fc55b75

...but we reset to initial conditions every time.

fix syntax for model restarts

c33de6c

working restart workflow

a68f8cd

WIP PFT configuration

964b174

ensemble restart workflow

0cf8782

move to root workflows dir

3cc9171

refactor settings creation into own file

5e8a404

pull run_sipnet_segmented into own function

f0b07c7

update plotting code

0df2bbc

support event ensembles

2b4bf9e

add README

e3cf5fb

github-actions bot added tests modules base models documentation labels Apr 5, 2026

ashiklom added 17 commits April 17, 2026 11:15

set pft_dir for BU SCC

2756786

Merge branch 'develop' into sipnet-restart-workflow

61aa302

Merge remote-tracking branch 'origin/develop' into sipnet-restart-wor…

f720c3d

…kflow

drop "sipnet only takes one PFT" from warning

3ea3b2b

update sipnet configs docs

53f2516

split inputs consistently is < end.time, not <=

0b3ac45

downgrade datetime coercion message to debug

3ab8def

change inputs -> results to avoid confusion

691e7d5

various R CMD check fixes

617b2fe

fix split inputs tests with new structure

e49e7e8

fix events_to_crop_cycle_starts tests

79952c4

Based on the actual JSON contents.

fix sipnet split_input tests

3f6ea0d

more rcheck fixes

464acf7

revert setting default ensemble method

7456de0

add tests for splitting sipnet events

a40d0b2

fix: write_events.SIPNET actually sorts by date

0abbbf7

continue runs to end of year

f061e09

ashiklom force-pushed the sipnet-restart-workflow branch from c40f98a to f061e09 Compare April 17, 2026 19:50

!(length > 0) --> length == 0

6554a6b

ashiklom marked this pull request as ready for review April 17, 2026 19:57

ashiklom changed the title ~~WIP SIPNET events workflow~~ SIPNET workflow for restarting with events Apr 17, 2026

run in whole-year chunks

d5c359e

start jan 1 of first planting year end dec 31 of last harvest year

generate dependencies (unrelated - data.atmosphere)

453fc3b

github-actions bot added the dockerfile label Apr 17, 2026

ashiklom added 3 commits April 17, 2026 16:20

remove warning for duplicate harvest/planting

7702281

PecanProject#3919 (comment)

Revert temp changes to data.atmosphere DESCRIPTION

a5dae48

This reverts commit 67e055d.

.data predicate in crop cycle starts

81e49dc

infotroph approved these changes Apr 18, 2026

View reviewed changes

dlebauer reviewed Apr 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SIPNET workflow for restarting with events#3919

SIPNET workflow for restarting with events#3919
ashiklom wants to merge 77 commits intoPecanProject:developfrom
ashiklom:sipnet-restart-workflow

ashiklom commented Apr 5, 2026

Uh oh!

ashiklom commented Apr 17, 2026 •

edited

Loading

Uh oh!

infotroph left a comment

Uh oh!

infotroph Apr 18, 2026

Uh oh!

infotroph Apr 18, 2026

Uh oh!

infotroph Apr 18, 2026

Uh oh!

infotroph Apr 18, 2026

Uh oh!

infotroph Apr 18, 2026

Uh oh!

infotroph Apr 18, 2026

Uh oh!

dlebauer Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	cls == "R" ~ "grass",
	cls == "R" ~ "grass",
	cls == "T" ~ "annual_crop",

-  inputs_runs <- read.csv(manifest_file)
+  inputs_runs <- read.csv(manifest_file) |>
+    dplyr::filter(.data$site_id == settings$run$site$id) |>
+    # TODO the manifest should probably report these already...
+    dplyr::mutate(
+      ens_num = .data$run_id |>
+        stringr::str_extract("ENS-(\\d+)", group = 1) |>
+        as.integer()
+    )

-    inputs_runs <- cbind.data.frame(inputs_runs, input_design)
+    inputs_runs <- inputs_runs |>
+      dplyr::left_join(
+        input_design |> tibble::rowid_to_column("ens_num"),
+        by = "ens_num",
+        relationship = "many-to-one")

	dplyr::ungroup()
	dplyr::filter(.data$site_id == run_settings$run$site$id) \|>
	dplyr::ungroup()


	write.csv(segments, file = file.path(run_dir, "segments.csv"), row.names = FALSE)

Conversation

ashiklom commented Apr 5, 2026

Uh oh!

ashiklom commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

infotroph left a comment

Choose a reason for hiding this comment

Uh oh!

infotroph Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

infotroph Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

infotroph Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

infotroph Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

infotroph Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

infotroph Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

dlebauer Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ashiklom commented Apr 17, 2026 •

edited

Loading