Skip to content

SIPNET workflow for restarting with events#3919

Open
ashiklom wants to merge 77 commits intoPecanProject:developfrom
ashiklom:sipnet-restart-workflow
Open

SIPNET workflow for restarting with events#3919
ashiklom wants to merge 77 commits intoPecanProject:developfrom
ashiklom:sipnet-restart-workflow

Conversation

@ashiklom
Copy link
Copy Markdown
Member

@ashiklom ashiklom commented Apr 5, 2026

Prototype of running SIPNET with event files that include changes in crops. A few implementation notes:

  • This supports PEcAn ensemble inputs (including for events)

  • event.json files for multiple ensemble members are stored in run$inputs$events$source. SIPNET-specific event.in files are stored in run$inputs$events$path (like other inputs). This is because the current functionality for finding segments to split the runs uses the JSON files. @infotroph 's subset_paths function from an earlier draft has been adapted and modified to subset both path and source paths (as long as they have the same lengths).

  • This circumvents runModule.start.model.runs and uses a direct execution loop instead. However, the output is PEcAn standard and follows PEcAn configuration conventions, so downstream analyses/workflows should work out of the box.

It's a bit hacky, but I think it does what it's supposed to and should be enough to unblock other CCMMF modeling tasks (@dlebauer).

@ashiklom ashiklom force-pushed the sipnet-restart-workflow branch from c40f98a to f061e09 Compare April 17, 2026 19:50
@ashiklom ashiklom marked this pull request as ready for review April 17, 2026 19:57
@ashiklom ashiklom changed the title WIP SIPNET events workflow SIPNET workflow for restarting with events Apr 17, 2026
start jan 1 of first planting year

end dec 31 of last harvest year
@ashiklom
Copy link
Copy Markdown
Member Author

ashiklom commented Apr 17, 2026

OK, I think I've resolved all of @divine7022 and @infotroph comments. I also resolved a significant bug with write_events.SIPNET (namely, the events were always being written out in the same order as events.json, not sorted by date as intended), and did a bit of other miscellaneous cleanup.

Note that this PR sits on top of #3836 and #3828 --- if we merge this, we can probably just close those.

@dlebauer :

Great work! Excited to have this working! 🥳

Before merging, please create a ticket to cover the refactoring plan we discussed last week.

From what I recall, the plan was approximately:

Goal: refactor segmented runs so that they integrate more cleanly with standard start_model_runs workflows.

Specifically:

* write.configs generates config files

* sites and ensembles run in parallel via start_model_runs

* This will require modifying or replacing the top‑level job.sh so that:
  
  * Segments run sequentially within a site, and
  * Sites still run in parallel via start_model_runs.

All of this already works in this PR. The new write_segmented_configs.SIPNET does a bunch of config preparation behind the scenes and then modifies the job.sh files associated with each ensemble member such that, instead of running SIPNET directly, they run the segments in sequence and then post-process the results. This means that the actual interface here is very "PEcAn-ic":

settings <- PEcAn.workflow::runModule.run.write.configs(
  settings_raw,
  input_design = sens_design$X
)

source("workflows/sipnet-restart-workflow/utils.R")
jobfiles <- write_segmented_configs.SIPNET(settings, sens_design$X, force_rerun = TRUE)

PEcAn.workflow::runModule_start_model_runs(settings)

Whatever method the user has specified for running job.sh files in settings will continue to work here without modification.

Copy link
Copy Markdown
Member

@infotroph infotroph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I was able to run through 3 ensembled sites in a lightly modified MAGiC ensemble workflow using just utils.R with the changes noted below. I think this is ready to merge. Next iteration can be to decide which package(s) to drop the functions into.

cls == "F" ~ "annual_crop",
cls == "G" ~ "grass",
cls == "P" ~ "grass",
cls == "R" ~ "grass",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this table is still temporary, but needed this one for the sites I grabbed

Suggested change
cls == "R" ~ "grass",
cls == "R" ~ "grass",
cls == "T" ~ "annual_crop",

if (!file.exists(manifest_file)) {
PEcAn.logger::logger.severe("Could not find manifest file: ", manifest_file)
}
inputs_runs <- read.csv(manifest_file)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
inputs_runs <- read.csv(manifest_file)
inputs_runs <- read.csv(manifest_file) |>
dplyr::filter(.data$site_id == settings$run$site$id) |>
# TODO the manifest should probably report these already...
dplyr::mutate(
ens_num = .data$run_id |>
stringr::str_extract("ENS-(\\d+)", group = 1) |>
as.integer()
)

}
inputs_runs <- read.csv(manifest_file)
if (!is.null(input_design)) {
inputs_runs <- cbind.data.frame(inputs_runs, input_design)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a multi-site run the cbind winds up becoming a cross join. Need to align explicitly by ensemble number

Suggested change
inputs_runs <- cbind.data.frame(inputs_runs, input_design)
inputs_runs <- inputs_runs |>
dplyr::left_join(
input_design |> tibble::rowid_to_column("ens_num"),
by = "ens_num",
relationship = "many-to-one")

stopifnot(file.exists(events_json))

crop_cycles <- PEcAn.data.land::events_to_crop_cycle_starts(events_json) |>
dplyr::ungroup()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dplyr::ungroup()
dplyr::filter(.data$site_id == run_settings$run$site$id) |>
dplyr::ungroup()

pft = crop2pft(.data$crop_code),
segment_dir = file.path(segment_rootdir, sprintf("segment_%s", .data$segment_id))
)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems useful to retain this information for diagnostics. I didn't think carefully about format or location, though -- counterproposals welcome

Suggested change
write.csv(segments, file = file.path(run_dir, "segments.csv"), row.names = FALSE)

)

source("workflows/sipnet-restart-workflow/utils.R")
jobfiles <- write_segmented_configs.SIPNET(settings, sens_design$X)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just leaving a breadcrumb hint for the next person

Suggested change
jobfiles <- write_segmented_configs.SIPNET(settings, sens_design$X)
jobfiles <- write_segmented_configs.SIPNET(settings, sens_design$X)
# Note: If running a multi-site workflow, use:
# jobfiles <- papply(settings, \(s) write_segmented_configs.SIPNET(s, sens_design$X))

settings
}

# TODO: We need a better, consistent implementation of this. However, this is
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this table was with the landiq code. If not, I think @infotroph , @sarahkanee , and I have each implemented one or more versions of this mapping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants