Conversation
There was a problem hiding this comment.
Pull request overview
Updates BIGr’s pedigree QC and imputation concordance utilities, primarily by expanding output/reporting options (including optional plotting) and refreshing the check_ped documentation to match new behaviors.
Changes:
- Expanded
check_pedreporting (exact duplicates, conflicting IDs) and added interactive/automatic saving behavior. - Added
plotandprint_resultoptions toimputation_concordanceand updated its roxygen docs. - Misc repo metadata changes (
RoxygenNote,.Rproj) and an unintended.DS_Storeaddition.
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
R/check_ped.R |
Adds new checks and introduces interactive prompt + GlobalEnv assignment behavior. |
man/check_ped.Rd |
Updates generated Rd docs for check_ped to reflect new outputs. |
R/imputation_concordance.R |
Adds optional plotting/printing and updates roxygen documentation. |
DESCRIPTION |
Bumps RoxygenNote. |
BIGr.Rproj |
Adds a ProjectId entry. |
.DS_Store |
Adds a macOS Finder artifact (should not be committed). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -1,4 +1,5 @@ | |||
| Version: 1.0 | |||
| ProjectId: 0eeaab63-2615-4da7-b10a-927160fc78a3 | |||
There was a problem hiding this comment.
Adding a ProjectId to the .Rproj file is usually machine/user-specific and tends to create noisy diffs. Unless there’s a clear need for it, consider omitting this line from version control.
| ProjectId: 0eeaab63-2615-4da7-b10a-927160fc78a3 |
| assign(corrected_name, data, envir = .GlobalEnv) | ||
| assign("input_ped_report", input_ped_report, envir = .GlobalEnv) | ||
| cat(paste0("Saved corrected pedigree as `", corrected_name, "` and report as `input_ped_report`.\n")) |
There was a problem hiding this comment.
In the interactive save path, the report is always assigned as input_ped_report rather than using the computed report_name. This makes the saved object naming inconsistent with the non-verbose path (which uses report_name). Please make the naming consistent and document it.
| #' @examples | ||
| #' result <- imputation_concordance( | ||
| #' reference_genos = ref, | ||
| #' imputed_genos = test, | ||
| #' snps_2_exclude = snps, |
There was a problem hiding this comment.
The @examples block uses ref, test, and snps objects that are not defined anywhere in the example. This will fail during package checks when examples are executed. Please either define them in the example (e.g., via system.file() + read.table()) or wrap the block in \dontrun{}.
| ped_errors <- check_ped(ped.file = ped_file, | ||
| seed = 101919) | ||
| ped_file <- system.file("check_ped_test.txt", package = "BIGr") | ||
| ped_errors <- check_ped(ped.file = ped_file, seed = 101919) |
There was a problem hiding this comment.
The example runs check_ped() with default verbose = TRUE, but the function now prompts via readline(). Package examples should not require interactive input; please adjust the example to use verbose = FALSE (or ensure prompting is gated behind interactive()).
| ped_errors <- check_ped(ped.file = ped_file, seed = 101919) | |
| ped_errors <- check_ped(ped.file = ped_file, seed = 101919, verbose = FALSE) |
| #' | ||
| #' @export | ||
| #' @import dplyr | ||
| #' @import ggplot2 |
There was a problem hiding this comment.
The function calls ggplot(), geom_bar(), etc. but the package DESCRIPTION does not list ggplot2 in Imports, and NAMESPACE currently has no ggplot2 imports. As-is, plot = TRUE will error unless ggplot2 happens to be attached by the user. Please add ggplot2 to DESCRIPTION Imports and regenerate NAMESPACE (or fully qualify calls with ggplot2::...).
| #' @import ggplot2 |
| if (sire != "0" && sire != id && !sire %in% data$id) { | ||
| missing_parents <- rbind(missing_parents, data.frame(id = sire, sire = "0", dam = "0", stringsAsFactors = FALSE)) | ||
| } | ||
| if (dam != "0" && dam != id && !dam %in% data$id) { | ||
| missing_parents <- rbind(missing_parents, data.frame(id = dam, sire = "0", dam = "0", stringsAsFactors = FALSE)) |
There was a problem hiding this comment.
Building missing_parents via repeated rbind() inside the row loop is quadratic and can be very slow for large pedigrees. Prefer collecting rows in a list and binding once (or use vectorized set operations) to keep runtime reasonable at scale.
| #### file names #### | ||
| file_base <- tools::file_path_sans_ext(basename(ped.file)) | ||
| corrected_name <- paste0(file_base, "_corrected") | ||
| report_name <- paste0(file_base, "_report") |
There was a problem hiding this comment.
report_name is computed but never used in the verbose = TRUE branch. Either use it for the report object name in both branches or remove it to avoid confusion.
| #' ped_errors <- check_ped(ped.file = ped_file, | ||
| #' seed = 101919) | ||
| #' ped_file <- system.file("check_ped_test.txt", package = "BIGr") | ||
| #' ped_errors <- check_ped(ped.file = ped_file, seed = 101919) |
There was a problem hiding this comment.
The documented example calls check_ped(...) with default verbose = TRUE, but the function now prompts via readline(). Running examples during R CMD check (non-interactive) can error or hang waiting for input. Please update the example to set verbose = FALSE and/or wrap interactive behavior in \dontrun{}/\donttest{}.
| #' ped_errors <- check_ped(ped.file = ped_file, seed = 101919) | |
| #' ped_errors <- check_ped(ped.file = ped_file, seed = 101919, verbose = FALSE) |
| repeated_ids_report <- conflicting_ids | ||
|
|
||
| #### check 3: missing parents #### | ||
| for (i in 1:nrow(data)) { |
There was a problem hiding this comment.
The loop uses for (i in 1:nrow(data)), which behaves incorrectly when nrow(data) == 0 (it iterates over c(1, 0)). Please use seq_len(nrow(data)) to avoid edge-case errors with empty input pedigrees.
| for (i in 1:nrow(data)) { | |
| for (i in seq_len(nrow(data))) { |
| missing_parents <- data.frame(id = character(), sire = character(), dam = character(), stringsAsFactors = FALSE) | ||
| ) | ||
|
|
||
| original_data <- data |
There was a problem hiding this comment.
original_data <- data is never used and can be removed to avoid confusion about whether the function intends to compare/restore against the original input.
| original_data <- data |
Added options to output graph or list. This makes ready for BIG integration