WIP: Add pkg/rdsdiff: reusable library for RDS version diffing#271
WIP: Add pkg/rdsdiff: reusable library for RDS version diffing#271Tal-or wants to merge 1 commit intoopenshift:mainfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughImplements an rdsdiff package: downloads/extracts archives, runs PolicyGenerator, extracts embedded Policy CRs into deterministic per-CR YAMLs, correlates keys, normalizes YAML, computes unified diffs, and emits a text report plus a JSON comparison; adds session, download, policy-gen, file-IO, storage, and constants utilities and tests. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant RDSDiff
participant GitHub
participant PolicyGen
participant FS as Filesystem
User->>RDSDiff: RunCompare(oldURL, newURL, sessionDir)
RDSDiff->>GitHub: Download/extract old archive
GitHub-->>FS: extracted old files
RDSDiff->>GitHub: Download/extract new archive
GitHub-->>FS: extracted new files
RDSDiff->>PolicyGen: RunPolicyGen on extracted dirs
PolicyGen-->>FS: generated multi-doc YAMLs
RDSDiff->>RDSDiff: ExtractCRsFromPolicyDoc -> write per-CR YAMLs to extractedDir
RDSDiff->>RDSDiff: Correlate keys -> onlyOld/onlyNew/inBoth
loop for each key in inBoth
RDSDiff->>RDSDiff: NormalizeYAML(old) & NormalizeYAML(new)
RDSDiff->>RDSDiff: ComputeDiff -> produce diff text
end
RDSDiff->>FS: Write diff-report.txt and comparison.json
RDSDiff-->>User: Return report path / summary
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes 🚥 Pre-merge checks | ✅ 9 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (9 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.11.4)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (4)
pkg/rdsdiff/policygen.go (1)
41-46: External command execution lacks timeout.
exec.Commandruns without a timeout, which could cause the function to hang indefinitely ifPolicyGeneratorstalls or enters an infinite loop. Consider usingexec.CommandContextwith a timeout context.🛡️ Suggested fix
+import ( + "context" + // ... other imports +) + +// Add timeout constant at package level +const PolicyGenTimeout = 5 * time.Minute func RunPolicyGen(configRoot, policyGeneratorPath, generatedDir string) error { // ... existing code ... for _, policyFile := range ListOfCRsForSNO { // ... existing stat check ... - cmd := exec.Command(policyGeneratorPath, policyPath) // `#nosec` G204 -- paths from config root + ctx, cancel := context.WithTimeout(context.Background(), PolicyGenTimeout) + cmd := exec.CommandContext(ctx, policyGeneratorPath, policyPath) // `#nosec` G204 -- paths from config root cmd.Dir = crsPathDir out, err := cmd.CombinedOutput() + cancel() if err != nil {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/rdsdiff/policygen.go` around lines 41 - 46, The external command creation in the policy generation path uses exec.Command without a timeout (see policyGeneratorPath, policyPath, cmd, crsPathDir and policyFile) which can hang; replace exec.Command with exec.CommandContext using a context.WithTimeout (pick a sensible timeout), defer cancel the context, then run cmd.CombinedOutput() (or Start/Wait) so the child is killed when the context expires and return an error that includes the timeout/cancellation details and the command output.pkg/rdsdiff/compare.go (1)
140-163: Multi-document YAML splitting may miss edge cases.Splitting on
"\n---"(Line 142) won't correctly handle:
- Documents starting with
---\nat the beginning of the file (no preceding newline)---with trailing content on the same line (valid YAML directive marker)The fallback on Lines 156-161 partially mitigates this for single-document files, but multi-document files with a leading
---marker may still have issues.♻️ Consider a more robust approach
func unmarshalMultiDocYAML(data []byte) []map[string]any { var docs []map[string]any - parts := strings.Split(string(data), "\n---") + // Handle leading --- and split on document separators + content := string(data) + content = strings.TrimPrefix(content, "---\n") + content = strings.TrimPrefix(content, "---\r\n") + parts := strings.Split(content, "\n---") for _, part := range parts {Alternatively, consider using a proper YAML streaming decoder that handles multi-document files natively.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/rdsdiff/compare.go` around lines 140 - 163, The current unmarshalMultiDocYAML function incorrectly splits multi-document YAML by "\n---"; replace the ad-hoc splitting with a proper YAML streaming decoder (use a yaml Decoder on the input bytes and call Decode in a loop) so it correctly handles leading document markers (---), markers with trailing content, and other edge cases; update unmarshalMultiDocYAML to iterate Decode into map[string]any until io.EOF and append each non-nil map to docs, falling back only if no documents were produced.pkg/rdsdiff/storage.go (1)
49-54: Path validation may reject valid session IDs or miss edge cases.The condition
cleaned != sessionIDon Line 49 can reject valid session IDs iffilepath.Cleannormalizes them (e.g., trailing spaces thatTrimSpacealready removed). Additionally, Line 52 only checks forfilepath.Separatorwhen it's not/, but the sessionID could still contain forward slashes on Windows which would pass this check but create subdirectories.Consider simplifying by checking for any path separator character:
♻️ Suggested improvement
cleaned := filepath.Clean(sessionID) - if cleaned != sessionID || strings.Contains(sessionID, "..") || filepath.IsAbs(sessionID) { + if cleaned != sessionID || strings.Contains(sessionID, "..") || filepath.IsAbs(sessionID) || strings.ContainsAny(sessionID, `/\`) { return "", ErrInvalidSessionID } - if filepath.Separator != '/' && strings.ContainsRune(sessionID, filepath.Separator) { - return "", ErrInvalidSessionID - }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/rdsdiff/storage.go` around lines 49 - 54, Replace the brittle cleaned != sessionID check and the OS-only separator test with a simpler, robust validation: first trim whitespace (sessionID = strings.TrimSpace(sessionID)); then reject if filepath.IsAbs(sessionID) or strings.Contains(sessionID, ".."); and reject if sessionID contains any path separator by checking both '/' and filepath.Separator (e.g., strings.ContainsAny(sessionID, "/"+string(filepath.Separator))). Remove the reliance on filepath.Clean comparison so valid but normalized IDs aren't incorrectly rejected.pkg/rdsdiff/download.go (1)
254-256: Comment rationale for#nosec G110is incorrect.The comment states "size bounded by client timeout" but HTTP client timeout bounds time, not size. A fast connection could download a very large file within the timeout. Consider using
io.LimitReaderto bound the extracted file size, or document the expected maximum archive size as an operational constraint.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/rdsdiff/download.go` around lines 254 - 256, The `#nosec` G110 justification is wrong—timeout bounds time not bytes—so remove that comment and enforce a size limit when extracting: introduce a constant like MaxExtractedFileSize and wrap the source reader rc with io.LimitReader(rc, MaxExtractedFileSize) before calling io.Copy(dst, ...), then detect if the copy hit the limit (bytes == MaxExtractedFileSize) and return a clear error; update the io.Copy(dst, rc) call site and any surrounding error handling in this download/extract code to use the new limited reader and fail if the extracted file exceeds the configured maximum, or alternatively document the operational maximum if opting not to enforce it.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/rdsdiff/compare.go`:
- Around line 432-433: Replace the ignored returns from os.ReadFile by capturing
the error for both oldContent and newContent (e.g. oldContent, err :=
os.ReadFile(oldPath) and newContent, err := os.ReadFile(newPath)), and handle
errors instead of discarding them: if err != nil return fmt.Errorf("reading %s:
%w", oldPath, err) (and similarly for newPath) or log the error with the package
logger and return; update the surrounding function (the compare code that calls
os.ReadFile) to propagate the error up the stack so failures reading the files
aren’t silently ignored.
In `@pkg/rdsdiff/download.go`:
- Around line 343-348: The tar path detection uses OS-specific
filepath.Separator when splitting entries (the code around variables first and
topLevel), which breaks on Windows; change the split to use the tar-standard
forward slash (e.g. use strings.SplitN(name, "/", 2) or otherwise split on '/')
so first is derived from the tar path correctly and topLevel detection (the
topLevel variable logic) works on all platforms.
In `@pkg/rdsdiff/storage.go`:
- Around line 56-57: The calls to filepath.Abs assigned to absWork and
absSession ignore errors; capture the returned errors (e.g., absWork, err :=
filepath.Abs(workDir) and absSession, err := filepath.Abs(sessionPath)) and
handle them rather than discarding—either return a wrapped error from the
current function (include context like "failed to get absolute path for
workDir/sessionPath") or log and fail appropriately; update the surrounding
function in pkg/rdsdiff/storage.go to check these err values before proceeding
to use filepath.Rel or other path operations.
---
Nitpick comments:
In `@pkg/rdsdiff/compare.go`:
- Around line 140-163: The current unmarshalMultiDocYAML function incorrectly
splits multi-document YAML by "\n---"; replace the ad-hoc splitting with a
proper YAML streaming decoder (use a yaml Decoder on the input bytes and call
Decode in a loop) so it correctly handles leading document markers (---),
markers with trailing content, and other edge cases; update
unmarshalMultiDocYAML to iterate Decode into map[string]any until io.EOF and
append each non-nil map to docs, falling back only if no documents were
produced.
In `@pkg/rdsdiff/download.go`:
- Around line 254-256: The `#nosec` G110 justification is wrong—timeout bounds
time not bytes—so remove that comment and enforce a size limit when extracting:
introduce a constant like MaxExtractedFileSize and wrap the source reader rc
with io.LimitReader(rc, MaxExtractedFileSize) before calling io.Copy(dst, ...),
then detect if the copy hit the limit (bytes == MaxExtractedFileSize) and return
a clear error; update the io.Copy(dst, rc) call site and any surrounding error
handling in this download/extract code to use the new limited reader and fail if
the extracted file exceeds the configured maximum, or alternatively document the
operational maximum if opting not to enforce it.
In `@pkg/rdsdiff/policygen.go`:
- Around line 41-46: The external command creation in the policy generation path
uses exec.Command without a timeout (see policyGeneratorPath, policyPath, cmd,
crsPathDir and policyFile) which can hang; replace exec.Command with
exec.CommandContext using a context.WithTimeout (pick a sensible timeout), defer
cancel the context, then run cmd.CombinedOutput() (or Start/Wait) so the child
is killed when the context expires and return an error that includes the
timeout/cancellation details and the command output.
In `@pkg/rdsdiff/storage.go`:
- Around line 49-54: Replace the brittle cleaned != sessionID check and the
OS-only separator test with a simpler, robust validation: first trim whitespace
(sessionID = strings.TrimSpace(sessionID)); then reject if
filepath.IsAbs(sessionID) or strings.Contains(sessionID, ".."); and reject if
sessionID contains any path separator by checking both '/' and
filepath.Separator (e.g., strings.ContainsAny(sessionID,
"/"+string(filepath.Separator))). Remove the reliance on filepath.Clean
comparison so valid but normalized IDs aren't incorrectly rejected.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 791f272e-b11d-4437-8677-02096735b34d
📒 Files selected for processing (7)
pkg/rdsdiff/compare.gopkg/rdsdiff/compare_test.gopkg/rdsdiff/constants.gopkg/rdsdiff/download.gopkg/rdsdiff/download_test.gopkg/rdsdiff/policygen.gopkg/rdsdiff/storage.go
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
pkg/rdsdiff/download.go (1)
337-350:⚠️ Potential issue | 🟠 MajorTar root detection can still return the wrong directory.
filepath.Clean(header.Name)normalizes tar paths to OS separators before Line 346 splits on/, so the Windows case from the earlier review is still broken. Also, once a second top-level entry is seen, settingtopLevel = ""is not sticky—a later entry can overwrite it and makeextractTarGzreturn a bogus root.Suggested fix
+import "path" + - name := strings.TrimPrefix(filepath.Clean(header.Name), ".") + name := strings.TrimPrefix(path.Clean(header.Name), ".") name = strings.TrimPrefix(name, "/") @@ - var topLevel string + var topLevel string + multipleTopLevels := false @@ - first := strings.SplitN(name, "/", 2)[0] - if topLevel == "" { - topLevel = first - } else if topLevel != first { - topLevel = "" - } + first := strings.SplitN(name, "/", 2)[0] + if topLevel == "" { + topLevel = first + } else if topLevel != first { + multipleTopLevels = true + } @@ - if topLevel != "" { + if topLevel != "" && !multipleTopLevels { return filepath.Join(destDir, topLevel), nil }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/rdsdiff/download.go` around lines 337 - 350, The tar root detection is broken because filepath.Clean normalizes to OS separators before splitting on '/' and because resetting topLevel to "" on a conflict can be overwritten later; change the normalization to use forward-slash-aware cleaning (e.g. use path.Clean or filepath.ToSlash on header.Name and then split on '/') so splitting uses '/' consistently across platforms, and make the "ambiguous" state sticky by tracking a boolean (e.g. topLevelAmbiguous) or only allowing topLevel to be set when it hasn't been marked ambiguous; update the logic around header.Name, filepath.Clean, safeJoinPath, the topLevel variable and the extractTarGz root-detection block so once a conflict sets ambiguity it cannot be later overwritten.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/rdsdiff/compare.go`:
- Around line 305-317: The YAML unmarshal logic in ComputeDiff currently
replaces failed parses with empty maps which can hide errors and produce a false
"no diff"; update ComputeDiff in pkg/rdsdiff/compare.go so that if either
sigsyaml.Unmarshal on oldData or newData fails (capture the error returned by
each call), you do not proceed with normalized map-based diffing but instead
fall back to computing a raw text diff of the original oldData/newData byte
slices (or their string forms) and return that result (preserving any error from
the diff operation) so callers see a meaningful diff when YAML normalization is
impossible; reference the existing local vars oldObj, newObj and the
sigsyaml.Unmarshal calls to implement the detection and fall-back behavior.
In `@pkg/rdsdiff/download.go`:
- Around line 257-259: Reject oversized archive entries instead of truncating or
streaming them unbounded: in the ZIP extraction path check f.UncompressedSize64
against MaxExtractedFileSize and return an error if it exceeds the limit (do not
rely on io.LimitReader + io.Copy to truncate), and in the TAR extraction path
(tar.TypeReg case) check header.Size before reading from rc and return an error
if header.Size > MaxExtractedFileSize; update the error messages where
io.Copy(dst, io.LimitReader(rc, MaxExtractedFileSize)) is used to reflect the
pre-check and ensure no partial writes occur to dst when sizes are exceeded.
---
Duplicate comments:
In `@pkg/rdsdiff/download.go`:
- Around line 337-350: The tar root detection is broken because filepath.Clean
normalizes to OS separators before splitting on '/' and because resetting
topLevel to "" on a conflict can be overwritten later; change the normalization
to use forward-slash-aware cleaning (e.g. use path.Clean or filepath.ToSlash on
header.Name and then split on '/') so splitting uses '/' consistently across
platforms, and make the "ambiguous" state sticky by tracking a boolean (e.g.
topLevelAmbiguous) or only allowing topLevel to be set when it hasn't been
marked ambiguous; update the logic around header.Name, filepath.Clean,
safeJoinPath, the topLevel variable and the extractTarGz root-detection block so
once a conflict sets ambiguity it cannot be later overwritten.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 2a90bf8d-f12c-4c39-845e-2d5f5e16efb8
📒 Files selected for processing (4)
pkg/rdsdiff/compare.gopkg/rdsdiff/download.gopkg/rdsdiff/policygen.gopkg/rdsdiff/storage.go
✅ Files skipped from review due to trivial changes (1)
- pkg/rdsdiff/storage.go
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/rdsdiff/policygen.go
| var oldObj, newObj map[string]any | ||
| if err := sigsyaml.Unmarshal(oldData, &oldObj); err != nil { | ||
| oldObj = nil | ||
| } | ||
| if err := sigsyaml.Unmarshal(newData, &newObj); err != nil { | ||
| newObj = nil | ||
| } | ||
| if oldObj == nil { | ||
| oldObj = make(map[string]any) | ||
| } | ||
| if newObj == nil { | ||
| newObj = make(map[string]any) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's find the file and examine the compute.go to understand the function context
find . -name "compare.go" -path "*/rdsdiff/*" | head -20Repository: openshift/kube-compare
Length of output: 90
🏁 Script executed:
# Get the file structure and size
wc -l pkg/rdsdiff/compare.go 2>/dev/null || find . -name "compare.go" -path "*rdsdiff*" -exec wc -l {} \;Repository: openshift/kube-compare
Length of output: 92
🏁 Script executed:
# Read the function containing lines 305-317 to understand context
sed -n '280,330p' pkg/rdsdiff/compare.goRepository: openshift/kube-compare
Length of output: 1512
🏁 Script executed:
# Find the computeDiff function signature and return type
rg "func.*computeDiff|func.*ComputeDiff" pkg/rdsdiff/ -A 5Repository: openshift/kube-compare
Length of output: 1859
🏁 Script executed:
# Check how unmarshal failures are actually handled - look at the full function
sed -n '1,350p' pkg/rdsdiff/compare.go | tail -100Repository: openshift/kube-compare
Length of output: 2564
Fall back to raw diff when YAML parsing fails to prevent false "no diff" results.
If both sigsyaml.Unmarshal calls fail, both sides become empty maps and ComputeDiff returns an empty diff, falsely indicating equality for malformed YAML files. Since ComputeDiff is exported and returns (string, error) without signaling parse errors, callers cannot detect this silent failure. Fall back to diffing the raw file contents when YAML normalization is impossible.
Suggested fix
var oldObj, newObj map[string]any
- if err := sigsyaml.Unmarshal(oldData, &oldObj); err != nil {
- oldObj = nil
- }
- if err := sigsyaml.Unmarshal(newData, &newObj); err != nil {
- newObj = nil
+ oldErr := sigsyaml.Unmarshal(oldData, &oldObj)
+ newErr := sigsyaml.Unmarshal(newData, &newObj)
+ if oldErr != nil || newErr != nil {
+ return unifiedDiff(string(oldData), string(newData), filepath.Base(oldPath), filepath.Base(newPath)), nil
}
if oldObj == nil {
oldObj = make(map[string]any)
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/rdsdiff/compare.go` around lines 305 - 317, The YAML unmarshal logic in
ComputeDiff currently replaces failed parses with empty maps which can hide
errors and produce a false "no diff"; update ComputeDiff in
pkg/rdsdiff/compare.go so that if either sigsyaml.Unmarshal on oldData or
newData fails (capture the error returned by each call), you do not proceed with
normalized map-based diffing but instead fall back to computing a raw text diff
of the original oldData/newData byte slices (or their string forms) and return
that result (preserving any error from the diff operation) so callers see a
meaningful diff when YAML normalization is impossible; reference the existing
local vars oldObj, newObj and the sigsyaml.Unmarshal calls to implement the
detection and fall-back behavior.
| if _, err = io.Copy(dst, io.LimitReader(rc, MaxExtractedFileSize)); err != nil { // #nosec G110 -- size bounded by MaxExtractedFileSize | ||
| return fmt.Errorf("copy %s: %w", f.Name, err) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, find and read the download.go file to see the current implementation
find . -name "download.go" -path "*/rdsdiff/*" | head -5Repository: openshift/kube-compare
Length of output: 91
🏁 Script executed:
# Get the file size to decide how to read it
wc -l pkg/rdsdiff/download.goRepository: openshift/kube-compare
Length of output: 93
🏁 Script executed:
# Read the relevant sections of the file
sed -n '250,275p' pkg/rdsdiff/download.goRepository: openshift/kube-compare
Length of output: 979
🏁 Script executed:
# Also read the tar extraction section around line 366-369
sed -n '360,380p' pkg/rdsdiff/download.goRepository: openshift/kube-compare
Length of output: 633
🏁 Script executed:
# Search for MaxExtractedFileSize definition
rg "MaxExtractedFileSize\s*=" pkg/rdsdiff/Repository: openshift/kube-compare
Length of output: 136
🏁 Script executed:
# Find the extractZipFile function
ast-grep --pattern $'func extractZipFile($_, $_) $_ {
$$$
}'Repository: openshift/kube-compare
Length of output: 1470
🏁 Script executed:
# Find tar extraction function and search for tar.TypeReg handling
ast-grep --pattern $'case tar.TypeReg:
$$$'Repository: openshift/kube-compare
Length of output: 48
🏁 Script executed:
# Search for the tar extraction context to see the full picture around line 366-369
sed -n '340,380p' pkg/rdsdiff/download.goRepository: openshift/kube-compare
Length of output: 1129
🏁 Script executed:
# Look for the tar.TypeReg case handling
sed -n '355,375p' pkg/rdsdiff/download.goRepository: openshift/kube-compare
Length of output: 762
The extraction size cap is not actually enforced.
On line 257, io.LimitReader truncates oversized zip entries and io.Copy still succeeds, so the library can diff partial files as if extraction was successful. On line 366, tar entries have no equivalent cap at all, so a crafted tarball can still fill the session volume. Reject entries larger than MaxExtractedFileSize instead of truncating or streaming them unbounded.
For zip files, pre-check f.UncompressedSize64 and return an error if it exceeds MaxExtractedFileSize. For tar files, check header.Size in the tar.TypeReg case before streaming.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/rdsdiff/download.go` around lines 257 - 259, Reject oversized archive
entries instead of truncating or streaming them unbounded: in the ZIP extraction
path check f.UncompressedSize64 against MaxExtractedFileSize and return an error
if it exceeds the limit (do not rely on io.LimitReader + io.Copy to truncate),
and in the TAR extraction path (tar.TypeReg case) check header.Size before
reading from rc and return an error if header.Size > MaxExtractedFileSize;
update the error messages where io.Copy(dst, io.LimitReader(rc,
MaxExtractedFileSize)) is used to reflect the pre-check and ensure no partial
writes occur to dst when sizes are exceeded.
a083dd6 to
f2c4478
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (2)
pkg/rdsdiff/download.go (1)
247-259:⚠️ Potential issue | 🟠 MajorReject oversized archive members instead of truncating them.
Line 257 and Line 366 use
io.LimitReader, which turns an oversized ZIP/TAR member into a partial write that still looks successful. That means truncated YAML can enter the diff pipeline as if extraction succeeded. Pre-checkf.UncompressedSize64andheader.Size, and fail before creating the output file.Proposed fix
func extractZipFile(f *zip.File, destDirClean string) error { + if f.UncompressedSize64 > uint64(MaxExtractedFileSize) { + return fmt.Errorf("zip entry %s exceeds max size %d bytes", f.Name, MaxExtractedFileSize) + } dstPath, err := safeJoinPath(destDirClean, f.Name) if err != nil { return err } @@ if err != nil { return fmt.Errorf("create %s: %w", dstPath, err) } defer dst.Close() - if _, err = io.Copy(dst, io.LimitReader(rc, MaxExtractedFileSize)); err != nil { // `#nosec` G110 -- size bounded by MaxExtractedFileSize + if _, err = io.Copy(dst, rc); err != nil { return fmt.Errorf("copy %s: %w", f.Name, err) } return nil } @@ case tar.TypeReg: + if header.Size > MaxExtractedFileSize { + return "", fmt.Errorf("tar entry %s exceeds max size %d bytes", name, MaxExtractedFileSize) + } if err := os.MkdirAll(filepath.Dir(dstPath), 0o750); err != nil { return "", fmt.Errorf("mkdir parent of %s: %w", dstPath, err) } outFile, err := os.Create(dstPath) // `#nosec` G304 -- dstPath from safeJoinPath if err != nil { return "", fmt.Errorf("create %s: %w", dstPath, err) } - if _, err = io.Copy(outFile, io.LimitReader(tr, MaxExtractedFileSize)); err != nil { // `#nosec` G110 -- size bounded by MaxExtractedFileSize + if _, err = io.Copy(outFile, tr); err != nil { outFile.Close() return "", fmt.Errorf("copy %s: %w", name, err) }Also applies to: 358-369
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/rdsdiff/download.go` around lines 247 - 259, Before creating dstPath and copying bytes, validate the archive member size against MaxExtractedFileSize and return an error if it exceeds the limit instead of using io.LimitReader; specifically, check f.UncompressedSize64 for ZIP entries and header.Size for TAR entries (the places that currently call io.LimitReader when copying to dst in pkg/rdsdiff/download.go), and fail early prior to os.Create(dstPath) so truncated writes cannot occur.pkg/rdsdiff/compare.go (1)
305-317:⚠️ Potential issue | 🟠 MajorFall back to a raw diff when YAML normalization cannot parse one side.
If both unmarshals fail,
oldObjandnewObjbecome empty maps andComputeDiffreturns an empty diff for two different invalid files. When either unmarshal fails, return a raw text diff ofoldDatavsnewDatainstead of normalizing empty objects.Proposed fix
var oldObj, newObj map[string]any - if err := sigsyaml.Unmarshal(oldData, &oldObj); err != nil { - oldObj = nil - } - if err := sigsyaml.Unmarshal(newData, &newObj); err != nil { - newObj = nil - } + oldErr := sigsyaml.Unmarshal(oldData, &oldObj) + newErr := sigsyaml.Unmarshal(newData, &newObj) + if oldErr != nil || newErr != nil { + return unifiedDiff(string(oldData), string(newData), filepath.Base(oldPath), filepath.Base(newPath)), nil + } if oldObj == nil { oldObj = make(map[string]any) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/rdsdiff/compare.go` around lines 305 - 317, The YAML unmarshal currently silences errors and replaces failed parses with empty maps (oldObj/newObj), causing ComputeDiff to report no changes for two invalid files; instead, capture the unmarshal errors from sigsyaml.Unmarshal for oldData and newData (e.g., errOld, errNew) and if either err is non-nil, bypass normalization and return a raw text diff of oldData vs newData (use the existing raw diff helper or create one) rather than proceeding with empty maps and calling ComputeDiff. Ensure you reference and update the logic around oldObj, newObj and sigsyaml.Unmarshal in compare.go so the raw-diff path is taken whenever an unmarshal fails.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/rdsdiff/compare.go`:
- Around line 140-166: Change unmarshalMultiDocYAML to return ([]map[string]any,
error) and stop silently skipping parse failures: inside unmarshalMultiDocYAML
(function name) return a descriptive error when sigsyaml.Unmarshal fails for any
non-empty document part instead of continue; keep the existing single-document
fallback but return its error if that also fails. Update the caller(s) such as
ExtractCRs to handle the new error return (propagate or wrap and return) so
malformed YAML causes a visible error rather than producing incomplete results.
In `@pkg/rdsdiff/download.go`:
- Around line 381-397: ValidateConfigurationRoot currently only checks existence
of sourceCRS and crsPath; update it to verify they are directories. After
calling os.Stat for sourceCRS and crsPath (variables sourceCRS and crsPath in
function ValidateConfigurationRoot), inspect the returned FileInfo.IsDir() and
return a clear error (mentioning SourceCRSPath or CRSPath and the full path) if
either is not a directory; keep existing error handling for os.IsNotExist and
other stat errors.
- Around line 118-129: The code currently calls io.ReadAll(resp.Body) after
client.Get(archiveURL), which can OOM on large/untrusted archives; replace
buffering with a bounded stream or disk-backed approach: either wrap resp.Body
with io.LimitReader using a defined maxArchiveSize constant and return an error
if the limit is hit, or stream the response directly to a temporary file via
os.CreateTemp and io.Copy (then use that file for extraction), and remove the
io.ReadAll usage; apply the same change to the second occurrence where
io.ReadAll(resp.Body) is used so both download paths (the resp, client.Get and
resp.Body usages) enforce the total archive-size cap before
buffering/extraction.
---
Duplicate comments:
In `@pkg/rdsdiff/compare.go`:
- Around line 305-317: The YAML unmarshal currently silences errors and replaces
failed parses with empty maps (oldObj/newObj), causing ComputeDiff to report no
changes for two invalid files; instead, capture the unmarshal errors from
sigsyaml.Unmarshal for oldData and newData (e.g., errOld, errNew) and if either
err is non-nil, bypass normalization and return a raw text diff of oldData vs
newData (use the existing raw diff helper or create one) rather than proceeding
with empty maps and calling ComputeDiff. Ensure you reference and update the
logic around oldObj, newObj and sigsyaml.Unmarshal in compare.go so the raw-diff
path is taken whenever an unmarshal fails.
In `@pkg/rdsdiff/download.go`:
- Around line 247-259: Before creating dstPath and copying bytes, validate the
archive member size against MaxExtractedFileSize and return an error if it
exceeds the limit instead of using io.LimitReader; specifically, check
f.UncompressedSize64 for ZIP entries and header.Size for TAR entries (the places
that currently call io.LimitReader when copying to dst in
pkg/rdsdiff/download.go), and fail early prior to os.Create(dstPath) so
truncated writes cannot occur.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 98d02ff0-432e-461c-9168-e5086d07375a
📒 Files selected for processing (7)
pkg/rdsdiff/compare.gopkg/rdsdiff/compare_test.gopkg/rdsdiff/constants.gopkg/rdsdiff/download.gopkg/rdsdiff/download_test.gopkg/rdsdiff/policygen.gopkg/rdsdiff/storage.go
✅ Files skipped from review due to trivial changes (2)
- pkg/rdsdiff/constants.go
- pkg/rdsdiff/download_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
- pkg/rdsdiff/storage.go
- pkg/rdsdiff/policygen.go
| func unmarshalMultiDocYAML(data []byte) []map[string]any { | ||
| var docs []map[string]any | ||
| content := string(data) | ||
| // Strip leading document marker so splitting on "\n---" works for all cases | ||
| content = strings.TrimPrefix(content, "---\n") | ||
| content = strings.TrimPrefix(content, "---\r\n") | ||
| parts := strings.Split(content, "\n---") | ||
| for _, part := range parts { | ||
| part = strings.TrimSpace(part) | ||
| if part == "" { | ||
| continue | ||
| } | ||
| var doc map[string]any | ||
| if err := sigsyaml.Unmarshal([]byte(part), &doc); err != nil { | ||
| continue | ||
| } | ||
| if doc != nil { | ||
| docs = append(docs, doc) | ||
| } | ||
| } | ||
| if len(docs) == 0 { | ||
| var single map[string]any | ||
| if err := sigsyaml.Unmarshal(data, &single); err == nil && single != nil { | ||
| docs = []map[string]any{single} | ||
| } | ||
| } | ||
| return docs |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Locate the file and examine its structure
find . -name "compare.go" -path "*/rdsdiff/*" | head -20Repository: openshift/kube-compare
Length of output: 90
🏁 Script executed:
# Get an overview of the file
wc -l pkg/rdsdiff/compare.goRepository: openshift/kube-compare
Length of output: 92
🏁 Script executed:
# Examine the unmarshalMultiDocYAML function and surrounding context
sed -n '130,180p' pkg/rdsdiff/compare.goRepository: openshift/kube-compare
Length of output: 1342
🏁 Script executed:
# Search for callers of unmarshalMultiDocYAML
rg "unmarshalMultiDocYAML" --type go -B 2 -A 2Repository: openshift/kube-compare
Length of output: 549
🏁 Script executed:
# Examine ExtractCRs function to understand the flow
rg "ExtractCRs" --type go -B 2 -A 10Repository: openshift/kube-compare
Length of output: 10005
🏁 Script executed:
# Check the computeDiff function mentioned in the scratchpad
rg "computeDiff" --type go -B 2 -A 15Repository: openshift/kube-compare
Length of output: 3198
Return an error from unmarshalMultiDocYAML instead of silently skipping malformed documents.
When YAML parsing fails for a document, the function silently skips it with continue, causing ExtractCRs to succeed with incomplete results and produce false comparisons. Change the function signature to ([]map[string]any, error) and return an error when a document cannot be unmarshaled, then handle this error appropriately in the caller.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/rdsdiff/compare.go` around lines 140 - 166, Change unmarshalMultiDocYAML
to return ([]map[string]any, error) and stop silently skipping parse failures:
inside unmarshalMultiDocYAML (function name) return a descriptive error when
sigsyaml.Unmarshal fails for any non-empty document part instead of continue;
keep the existing single-document fallback but return its error if that also
fails. Update the caller(s) such as ExtractCRs to handle the new error return
(propagate or wrap and return) so malformed YAML causes a visible error rather
than producing incomplete results.
| resp, err := client.Get(archiveURL) // nolint:noctx | ||
| if err != nil { | ||
| return "", fmt.Errorf("download %s: %w", archiveURL, err) | ||
| } | ||
| defer resp.Body.Close() | ||
| if resp.StatusCode != http.StatusOK { | ||
| return "", fmt.Errorf("download %s: status %s", archiveURL, resp.Status) | ||
| } | ||
| data, err := io.ReadAll(resp.Body) | ||
| if err != nil { | ||
| return "", fmt.Errorf("read archive: %w", err) | ||
| } |
There was a problem hiding this comment.
Cap archive downloads before buffering them into memory.
Both download paths call io.ReadAll(resp.Body) on untrusted responses. A large archive can exhaust process memory before any per-file extraction limit applies. Please enforce a total archive-size cap before buffering, or stream to a temp file and extract from there.
Also applies to: 145-159
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/rdsdiff/download.go` around lines 118 - 129, The code currently calls
io.ReadAll(resp.Body) after client.Get(archiveURL), which can OOM on
large/untrusted archives; replace buffering with a bounded stream or disk-backed
approach: either wrap resp.Body with io.LimitReader using a defined
maxArchiveSize constant and return an error if the limit is hit, or stream the
response directly to a temporary file via os.CreateTemp and io.Copy (then use
that file for extraction), and remove the io.ReadAll usage; apply the same
change to the second occurrence where io.ReadAll(resp.Body) is used so both
download paths (the resp, client.Get and resp.Body usages) enforce the total
archive-size cap before buffering/extraction.
| // ValidateConfigurationRoot checks that source-crs and CRSPath exist under root. | ||
| func ValidateConfigurationRoot(root string) error { | ||
| sourceCRS := filepath.Join(root, SourceCRSPath) | ||
| crsPath := filepath.Join(root, CRSPath) | ||
| if _, err := os.Stat(sourceCRS); err != nil { | ||
| if os.IsNotExist(err) { | ||
| return fmt.Errorf("configuration root missing %s (path: %s)", SourceCRSPath, sourceCRS) | ||
| } | ||
| return fmt.Errorf("stat %s: %w", sourceCRS, err) | ||
| } | ||
| if _, err := os.Stat(crsPath); err != nil { | ||
| if os.IsNotExist(err) { | ||
| return fmt.Errorf("configuration root missing %s (path: %s)", CRSPath, crsPath) | ||
| } | ||
| return fmt.Errorf("stat %s: %w", crsPath, err) | ||
| } | ||
| return nil |
There was a problem hiding this comment.
Validate directory shape, not just path existence.
ValidateConfigurationRoot currently passes if source-crs or argocd/example/acmpolicygenerator is a regular file. The later workflow expects directories, so this should check info.IsDir() and fail here with a targeted validation error.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/rdsdiff/download.go` around lines 381 - 397, ValidateConfigurationRoot
currently only checks existence of sourceCRS and crsPath; update it to verify
they are directories. After calling os.Stat for sourceCRS and crsPath (variables
sourceCRS and crsPath in function ValidateConfigurationRoot), inspect the
returned FileInfo.IsDir() and return a clear error (mentioning SourceCRSPath or
CRSPath and the full path) if either is not a directory; keep existing error
handling for os.IsNotExist and other stat errors.
f2c4478 to
1306b54
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
pkg/rdsdiff/compare.go (2)
146-172:⚠️ Potential issue | 🟠 MajorDon't silently skip malformed YAML documents.
continueon Line 159 turns a bad policy file into a partial extraction, so the comparison can miss CRs without surfacing any error.unmarshalMultiDocYAMLshould return an error for any non-empty document that fails to decode, andExtractCRsshould propagate it.Suggested fix
-func unmarshalMultiDocYAML(data []byte) []map[string]any { +func unmarshalMultiDocYAML(data []byte) ([]map[string]any, error) { var docs []map[string]any content := string(data) // Strip leading document marker so splitting on "\n---" works for all cases content = strings.TrimPrefix(content, "---\n") content = strings.TrimPrefix(content, "---\r\n") parts := strings.Split(content, "\n---") for _, part := range parts { part = strings.TrimSpace(part) if part == "" { continue } var doc map[string]any if err := sigsyaml.Unmarshal([]byte(part), &doc); err != nil { - continue + return nil, fmt.Errorf("unmarshal YAML document: %w", err) } if doc != nil { docs = append(docs, doc) } } if len(docs) == 0 { var single map[string]any - if err := sigsyaml.Unmarshal(data, &single); err == nil && single != nil { - docs = []map[string]any{single} + if err := sigsyaml.Unmarshal(data, &single); err != nil { + return nil, fmt.Errorf("unmarshal YAML: %w", err) + } + if single != nil { + docs = []map[string]any{single} } } - return docs + return docs, nil }- docs := unmarshalMultiDocYAML(data) + docs, err := unmarshalMultiDocYAML(data) + if err != nil { + return fmt.Errorf("parse %s: %w", path, err) + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/rdsdiff/compare.go` around lines 146 - 172, unmarshalMultiDocYAML currently swallows YAML unmarshal errors (the continue after sigsyaml.Unmarshal) which allows malformed documents to silently produce partial results; change unmarshalMultiDocYAML to return ([]map[string]any, error) and on any non-empty part that fails sigsyaml.Unmarshal return a descriptive error (including part/index) instead of continuing, and update the caller ExtractCRs to accept and propagate that error so malformed policy files surface as failures rather than partial extractions.
311-332:⚠️ Potential issue | 🟠 MajorFall back to raw diff when YAML normalization fails.
If either file is invalid YAML, Lines 312-323 collapse that side to an empty map. When both parses fail,
ComputeDiffreturns"", which is a false "no diff" result for two malformed but different files.Suggested fix
var oldObj, newObj map[string]any - if err := sigsyaml.Unmarshal(oldData, &oldObj); err != nil { - oldObj = nil - } - if err := sigsyaml.Unmarshal(newData, &newObj); err != nil { - newObj = nil - } + oldErr := sigsyaml.Unmarshal(oldData, &oldObj) + newErr := sigsyaml.Unmarshal(newData, &newObj) + if oldErr != nil || newErr != nil { + return unifiedDiff(string(oldData), string(newData), filepath.Base(oldPath), filepath.Base(newPath)), nil + } if oldObj == nil { oldObj = make(map[string]any) } if newObj == nil { newObj = make(map[string]any) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/rdsdiff/compare.go` around lines 311 - 332, The current code drops a side to an empty map when sigsyaml.Unmarshal fails causing two malformed-but-different files to appear identical; change ComputeDiff (the block using sigsyaml.Unmarshal, normalizeYAML and unifiedDiff) to detect any unmarshal error and in that case skip YAML normalization and call unifiedDiff on the raw file contents (convert oldData/newData to strings) instead of using empty maps; keep using normalizeYAML/unifiedDiff only when both unmarshals succeed so that malformed YAML falls back to a raw diff rather than producing an empty no-diff result.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/rdsdiff/download_test.go`:
- Around line 14-32: ParseGitHubTreeURL currently splits on slashes and treats
parts[3] as the ref which breaks when refs contain '/' — update
ParseGitHubTreeURL (pkg/rdsdiff/download.go) to parse the path with a regex that
allows the ref to contain slashes and an optional trailing path, e.g. match
^/([^/]+)/([^/]+)/tree/([^/]+(?:/[^/]+)*)(?:/(.*))?$ and assign Owner=group1,
Repo=group2, Branch=group3, Path=group4 (empty if missing); keep ArchiveURL()
and TopLevelDir() logic but base them on the captured Branch. Ensure tests
(TestParseGitHubTreeURL_* in pkg/rdsdiff/download_test.go) cover refs with
slashes.
- Around line 88-106: The tests for SessionDir should assert the created
directory is actually inside and created: in TestSessionDir_CreatesUnderWorkDir,
resolve absolute paths for workDir and sessionDir (use filepath.Abs) and assert
sessionDir has workDir as a prefix (not just contains), then call
os.Stat(sessionDir) and assert it exists and is a directory; in
TestSessionDir_UsesTempWhenEmpty, after calling SessionDir("", "req-2") defer
os.RemoveAll(sessionDir), perform os.Stat(sessionDir) and assert it exists and
is a directory to ensure the temp directory was created; reference the
SessionDir function and the two test names when making the changes.
---
Duplicate comments:
In `@pkg/rdsdiff/compare.go`:
- Around line 146-172: unmarshalMultiDocYAML currently swallows YAML unmarshal
errors (the continue after sigsyaml.Unmarshal) which allows malformed documents
to silently produce partial results; change unmarshalMultiDocYAML to return
([]map[string]any, error) and on any non-empty part that fails
sigsyaml.Unmarshal return a descriptive error (including part/index) instead of
continuing, and update the caller ExtractCRs to accept and propagate that error
so malformed policy files surface as failures rather than partial extractions.
- Around line 311-332: The current code drops a side to an empty map when
sigsyaml.Unmarshal fails causing two malformed-but-different files to appear
identical; change ComputeDiff (the block using sigsyaml.Unmarshal, normalizeYAML
and unifiedDiff) to detect any unmarshal error and in that case skip YAML
normalization and call unifiedDiff on the raw file contents (convert
oldData/newData to strings) instead of using empty maps; keep using
normalizeYAML/unifiedDiff only when both unmarshals succeed so that malformed
YAML falls back to a raw diff rather than producing an empty no-diff result.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 058e8cca-0b06-4e73-81ee-d48bea882604
📒 Files selected for processing (7)
pkg/rdsdiff/compare.gopkg/rdsdiff/compare_test.gopkg/rdsdiff/constants.gopkg/rdsdiff/download.gopkg/rdsdiff/download_test.gopkg/rdsdiff/policygen.gopkg/rdsdiff/storage.go
✅ Files skipped from review due to trivial changes (3)
- pkg/rdsdiff/constants.go
- pkg/rdsdiff/storage.go
- pkg/rdsdiff/download.go
🚧 Files skipped from review as they are similar to previous changes (2)
- pkg/rdsdiff/compare_test.go
- pkg/rdsdiff/policygen.go
| func TestParseGitHubTreeURL_FullURL(t *testing.T) { | ||
| u, err := ParseGitHubTreeURL("https://github.com/openshift-kni/telco-reference/tree/konflux-telco-core-rds-4-20/telco-ran/configuration") | ||
| require.NoError(t, err) | ||
| assert.Equal(t, "openshift-kni", u.Owner) | ||
| assert.Equal(t, "telco-reference", u.Repo) | ||
| assert.Equal(t, "konflux-telco-core-rds-4-20", u.Branch) | ||
| assert.Equal(t, "telco-ran/configuration", u.Path) | ||
| assert.Equal(t, "https://github.com/openshift-kni/telco-reference/archive/refs/heads/konflux-telco-core-rds-4-20.zip", u.ArchiveURL()) | ||
| assert.Equal(t, "telco-reference-konflux-telco-core-rds-4-20", u.TopLevelDir()) | ||
| } | ||
|
|
||
| func TestParseGitHubTreeURL_BranchOnly(t *testing.T) { | ||
| u, err := ParseGitHubTreeURL("https://github.com/org/repo/tree/main") | ||
| require.NoError(t, err) | ||
| assert.Equal(t, "org", u.Owner) | ||
| assert.Equal(t, "repo", u.Repo) | ||
| assert.Equal(t, "main", u.Branch) | ||
| assert.Empty(t, u.Path) | ||
| } |
There was a problem hiding this comment.
Add coverage for refs that contain /.
ParseGitHubTreeURL in pkg/rdsdiff/download.go:57-88 currently takes parts[3] as the ref, so a valid tree URL like /tree/release/4.20/telco-ran/configuration is misparsed as Branch=release and Path=4.20/.... This suite only covers slash-free refs, so that bug stays invisible.
Suggested test to expose the bug
func TestParseGitHubTreeURL_BranchOnly(t *testing.T) {
u, err := ParseGitHubTreeURL("https://github.com/org/repo/tree/main")
require.NoError(t, err)
assert.Equal(t, "org", u.Owner)
assert.Equal(t, "repo", u.Repo)
assert.Equal(t, "main", u.Branch)
assert.Empty(t, u.Path)
}
+
+func TestParseGitHubTreeURL_BranchWithSlash(t *testing.T) {
+ u, err := ParseGitHubTreeURL("https://github.com/org/repo/tree/release/4.20/telco-ran/configuration")
+ require.NoError(t, err)
+ assert.Equal(t, "release/4.20", u.Branch)
+ assert.Equal(t, "telco-ran/configuration", u.Path)
+}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func TestParseGitHubTreeURL_FullURL(t *testing.T) { | |
| u, err := ParseGitHubTreeURL("https://github.com/openshift-kni/telco-reference/tree/konflux-telco-core-rds-4-20/telco-ran/configuration") | |
| require.NoError(t, err) | |
| assert.Equal(t, "openshift-kni", u.Owner) | |
| assert.Equal(t, "telco-reference", u.Repo) | |
| assert.Equal(t, "konflux-telco-core-rds-4-20", u.Branch) | |
| assert.Equal(t, "telco-ran/configuration", u.Path) | |
| assert.Equal(t, "https://github.com/openshift-kni/telco-reference/archive/refs/heads/konflux-telco-core-rds-4-20.zip", u.ArchiveURL()) | |
| assert.Equal(t, "telco-reference-konflux-telco-core-rds-4-20", u.TopLevelDir()) | |
| } | |
| func TestParseGitHubTreeURL_BranchOnly(t *testing.T) { | |
| u, err := ParseGitHubTreeURL("https://github.com/org/repo/tree/main") | |
| require.NoError(t, err) | |
| assert.Equal(t, "org", u.Owner) | |
| assert.Equal(t, "repo", u.Repo) | |
| assert.Equal(t, "main", u.Branch) | |
| assert.Empty(t, u.Path) | |
| } | |
| func TestParseGitHubTreeURL_FullURL(t *testing.T) { | |
| u, err := ParseGitHubTreeURL("https://github.com/openshift-kni/telco-reference/tree/konflux-telco-core-rds-4-20/telco-ran/configuration") | |
| require.NoError(t, err) | |
| assert.Equal(t, "openshift-kni", u.Owner) | |
| assert.Equal(t, "telco-reference", u.Repo) | |
| assert.Equal(t, "konflux-telco-core-rds-4-20", u.Branch) | |
| assert.Equal(t, "telco-ran/configuration", u.Path) | |
| assert.Equal(t, "https://github.com/openshift-kni/telco-reference/archive/refs/heads/konflux-telco-core-rds-4-20.zip", u.ArchiveURL()) | |
| assert.Equal(t, "telco-reference-konflux-telco-core-rds-4-20", u.TopLevelDir()) | |
| } | |
| func TestParseGitHubTreeURL_BranchOnly(t *testing.T) { | |
| u, err := ParseGitHubTreeURL("https://github.com/org/repo/tree/main") | |
| require.NoError(t, err) | |
| assert.Equal(t, "org", u.Owner) | |
| assert.Equal(t, "repo", u.Repo) | |
| assert.Equal(t, "main", u.Branch) | |
| assert.Empty(t, u.Path) | |
| } | |
| func TestParseGitHubTreeURL_BranchWithSlash(t *testing.T) { | |
| u, err := ParseGitHubTreeURL("https://github.com/org/repo/tree/release/4.20/telco-ran/configuration") | |
| require.NoError(t, err) | |
| assert.Equal(t, "release/4.20", u.Branch) | |
| assert.Equal(t, "telco-ran/configuration", u.Path) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/rdsdiff/download_test.go` around lines 14 - 32, ParseGitHubTreeURL
currently splits on slashes and treats parts[3] as the ref which breaks when
refs contain '/' — update ParseGitHubTreeURL (pkg/rdsdiff/download.go) to parse
the path with a regex that allows the ref to contain slashes and an optional
trailing path, e.g. match ^/([^/]+)/([^/]+)/tree/([^/]+(?:/[^/]+)*)(?:/(.*))?$
and assign Owner=group1, Repo=group2, Branch=group3, Path=group4 (empty if
missing); keep ArchiveURL() and TopLevelDir() logic but base them on the
captured Branch. Ensure tests (TestParseGitHubTreeURL_* in
pkg/rdsdiff/download_test.go) cover refs with slashes.
| func TestSessionDir_CreatesUnderWorkDir(t *testing.T) { | ||
| workDir := t.TempDir() | ||
| sessionDir, err := SessionDir(workDir, "req-1") | ||
| require.NoError(t, err) | ||
| assert.True(t, len(sessionDir) > len(workDir)) | ||
| assert.Contains(t, sessionDir, "rds-diff-req-1") | ||
|
|
||
| info, err := os.Stat(sessionDir) | ||
| require.NoError(t, err) | ||
| assert.True(t, info.IsDir()) | ||
| } | ||
|
|
||
| func TestSessionDir_UsesTempWhenEmpty(t *testing.T) { | ||
| sessionDir, err := SessionDir("", "req-2") | ||
| require.NoError(t, err) | ||
| defer os.RemoveAll(sessionDir) | ||
| assert.NotEmpty(t, sessionDir) | ||
| assert.Contains(t, sessionDir, "rds-diff-req-2") | ||
| } |
There was a problem hiding this comment.
Strengthen the SessionDir assertions.
These tests mostly validate string shape. They would still pass if SessionDir returned a sibling path outside workDir, and the empty-workDir case never verifies that the directory was actually created.
Suggested assertion tightening
func TestSessionDir_CreatesUnderWorkDir(t *testing.T) {
workDir := t.TempDir()
sessionDir, err := SessionDir(workDir, "req-1")
require.NoError(t, err)
- assert.True(t, len(sessionDir) > len(workDir))
assert.Contains(t, sessionDir, "rds-diff-req-1")
+ rel, err := filepath.Rel(workDir, sessionDir)
+ require.NoError(t, err)
+ assert.NotContains(t, rel, "..")
info, err := os.Stat(sessionDir)
require.NoError(t, err)
assert.True(t, info.IsDir())
}
func TestSessionDir_UsesTempWhenEmpty(t *testing.T) {
sessionDir, err := SessionDir("", "req-2")
require.NoError(t, err)
defer os.RemoveAll(sessionDir)
assert.NotEmpty(t, sessionDir)
assert.Contains(t, sessionDir, "rds-diff-req-2")
+ info, err := os.Stat(sessionDir)
+ require.NoError(t, err)
+ assert.True(t, info.IsDir())
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func TestSessionDir_CreatesUnderWorkDir(t *testing.T) { | |
| workDir := t.TempDir() | |
| sessionDir, err := SessionDir(workDir, "req-1") | |
| require.NoError(t, err) | |
| assert.True(t, len(sessionDir) > len(workDir)) | |
| assert.Contains(t, sessionDir, "rds-diff-req-1") | |
| info, err := os.Stat(sessionDir) | |
| require.NoError(t, err) | |
| assert.True(t, info.IsDir()) | |
| } | |
| func TestSessionDir_UsesTempWhenEmpty(t *testing.T) { | |
| sessionDir, err := SessionDir("", "req-2") | |
| require.NoError(t, err) | |
| defer os.RemoveAll(sessionDir) | |
| assert.NotEmpty(t, sessionDir) | |
| assert.Contains(t, sessionDir, "rds-diff-req-2") | |
| } | |
| func TestSessionDir_CreatesUnderWorkDir(t *testing.T) { | |
| workDir := t.TempDir() | |
| sessionDir, err := SessionDir(workDir, "req-1") | |
| require.NoError(t, err) | |
| assert.Contains(t, sessionDir, "rds-diff-req-1") | |
| rel, err := filepath.Rel(workDir, sessionDir) | |
| require.NoError(t, err) | |
| assert.NotContains(t, rel, "..") | |
| info, err := os.Stat(sessionDir) | |
| require.NoError(t, err) | |
| assert.True(t, info.IsDir()) | |
| } | |
| func TestSessionDir_UsesTempWhenEmpty(t *testing.T) { | |
| sessionDir, err := SessionDir("", "req-2") | |
| require.NoError(t, err) | |
| defer os.RemoveAll(sessionDir) | |
| assert.NotEmpty(t, sessionDir) | |
| assert.Contains(t, sessionDir, "rds-diff-req-2") | |
| info, err := os.Stat(sessionDir) | |
| require.NoError(t, err) | |
| assert.True(t, info.IsDir()) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/rdsdiff/download_test.go` around lines 88 - 106, The tests for SessionDir
should assert the created directory is actually inside and created: in
TestSessionDir_CreatesUnderWorkDir, resolve absolute paths for workDir and
sessionDir (use filepath.Abs) and assert sessionDir has workDir as a prefix (not
just contains), then call os.Stat(sessionDir) and assert it exists and is a
directory; in TestSessionDir_UsesTempWhenEmpty, after calling SessionDir("",
"req-2") defer os.RemoveAll(sessionDir), perform os.Stat(sessionDir) and assert
it exists and is a directory to ensure the temp directory was created; reference
the SessionDir function and the two test names when making the changes.
Add a reusable library for comparing two versions of a telco RDS (Reference Design Specification). Given two downloadable RDS sources (GitHub tree URLs or direct archive links), the library handles the full workflow: download and extract archives, run the PolicyGenerator binary to produce policies, extract individual CRs from the generated ACM Policy documents, normalize and diff the CRs, and produce both a human-readable text report and a structured JSON comparison summary. This package is intended to be consumed by tools such as kube-compare-mcp, which can wrap it as a thin MCP tool handler without embedding domain logic. New package: pkg/rdsdiff/ Source files: - constants.go: telco-ran path constants (CRSPath, SourceCRSPath) and the SNO policy file list used by PolicyGenerator. - download.go: GitHub tree URL parsing (ParseGitHubTreeURL), archive download and extraction for both zip and tar.gz formats, with zip-slip prevention and top-level directory detection. - policygen.go: PolicyGenerator binary wrapper (RunPolicyGen) that copies source-crs into the expected layout and runs the generator for each SNO policy file. - compare.go: CR extraction from ACM Policy documents, YAML normalization with sorted keys, key correlation across versions, line-level unified diff using diffmatchpatch, and report/JSON generation (Run, RunCompare, ExtractCRs). - storage.go: session directory management with path traversal protection (SessionDir, SessionID, ResolveSessionPath). Test files: - compare_test.go: 24 tests covering CR extraction, normalization, correlation, diff computation, and end-to-end report generation. - download_test.go: 7 tests covering URL parsing, validation, download error handling, and session directory creation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Talor Itzhak <titzhak@redhat.com>
1306b54 to
72c0b75
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
pkg/rdsdiff/compare_test.go (1)
430-435: Tighten theListOfCRsForSNOcontract check.
assert.Contains(..., "acm-common-ranGen.yaml")plusassert.Len(..., 4)is a weak signal: it can miss wrong replacements for the other entries, and it will also fail on a harmless addition. Prefer asserting the exact expected set if this list is part of the contract, or only the specific invariant you care about if the list is expected to evolve.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/rdsdiff/compare_test.go` around lines 430 - 435, TestConstants currently checks ListOfCRsForSNO with assert.Contains and assert.Len which is brittle; update the test to assert the exact expected entries (or use assert.ElementsMatch to check set equality) against ListOfCRsForSNO so any unexpected changes are caught. Modify the TestConstants function to replace the assert.Contains and assert.Len checks on ListOfCRsForSNO with a single assertion that compares ListOfCRsForSNO to the expected slice (or uses ElementsMatch if order is not important).pkg/rdsdiff/download.go (1)
179-186: Consider preserving extraction errors in the fallback path.When format detection fails and both
extractZipandextractTarGzare attempted, their error messages are discarded. The final error only mentionsContent-Type, which may not help diagnose malformed archives.♻️ Suggested improvement
- if effectiveRoot, err := extractZip(data, destDir); err == nil { + zipErr := error(nil) + tarErr := error(nil) + if effectiveRoot, zipErr := extractZip(data, destDir); zipErr == nil { return effectiveRoot, nil } - if effectiveRoot, err := extractTarGz(data, destDir); err == nil { + if effectiveRoot, tarErr := extractTarGz(data, destDir); tarErr == nil { return effectiveRoot, nil } - return "", fmt.Errorf("unsupported archive format (expected zip or tar.gz); Content-Type: %q", ct) + return "", fmt.Errorf("unsupported archive format (expected zip or tar.gz); Content-Type: %q; zip: %v; tar.gz: %v", ct, zipErr, tarErr)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/rdsdiff/download.go` around lines 179 - 186, The fallback path discards errors from extractZip and extractTarGz, losing useful diagnostic info; update the logic in the function containing the current snippet to capture both errors (from extractZip and extractTarGz), and include them in the returned error (e.g., wrap or concatenate the two errors along with the Content-Type) so callers see why each extractor failed; reference extractZip and extractTarGz to locate the failing calls and modify the final fmt.Errorf to include those error details.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/rdsdiff/compare_test.go`:
- Around line 411-428: The test TestRunCompare_ExtractsAndWritesArtifacts
currently only checks artifact existence; update it to assert that RunCompare
actually extracted per-CR outputs by verifying either that specific extracted CR
files exist under sessionDir (e.g. files named like
ConfigMap_openshift-monitoring_my-config.yaml and
ImageContentSourcePolicy_my-icsp.yaml) or by reading and unmarshalling
comparison.json and asserting it contains the expected CR keys (e.g.
"ConfigMap_openshift-monitoring_my-config" and
"ImageContentSourcePolicy_my-icsp"); ensure you reference RunCompare, sessionDir
and comparison.json when adding these assertions so the end-to-end extraction
path is covered.
In `@pkg/rdsdiff/storage.go`:
- Around line 16-30: The SessionDir function constructs a directory name using
the raw requestID which can include path separators or special entries and cause
nested or unexpected directories; update SessionDir to validate/sanitize
requestID (similar to ResolveSessionPath) by rejecting empty values, any path
separators ('/' or filepath.Separator, backslash), and path traversal tokens
like "..", returning an error if invalid, before formatting the session name and
creating the directory so only a single session folder is created.
---
Nitpick comments:
In `@pkg/rdsdiff/compare_test.go`:
- Around line 430-435: TestConstants currently checks ListOfCRsForSNO with
assert.Contains and assert.Len which is brittle; update the test to assert the
exact expected entries (or use assert.ElementsMatch to check set equality)
against ListOfCRsForSNO so any unexpected changes are caught. Modify the
TestConstants function to replace the assert.Contains and assert.Len checks on
ListOfCRsForSNO with a single assertion that compares ListOfCRsForSNO to the
expected slice (or uses ElementsMatch if order is not important).
In `@pkg/rdsdiff/download.go`:
- Around line 179-186: The fallback path discards errors from extractZip and
extractTarGz, losing useful diagnostic info; update the logic in the function
containing the current snippet to capture both errors (from extractZip and
extractTarGz), and include them in the returned error (e.g., wrap or concatenate
the two errors along with the Content-Type) so callers see why each extractor
failed; reference extractZip and extractTarGz to locate the failing calls and
modify the final fmt.Errorf to include those error details.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 70668fce-ec32-4cbe-9ad0-1f7d02fefa5d
📒 Files selected for processing (8)
pkg/rdsdiff/compare.gopkg/rdsdiff/compare_test.gopkg/rdsdiff/constants.gopkg/rdsdiff/download.gopkg/rdsdiff/download_test.gopkg/rdsdiff/fileio.gopkg/rdsdiff/policygen.gopkg/rdsdiff/storage.go
✅ Files skipped from review due to trivial changes (3)
- pkg/rdsdiff/fileio.go
- pkg/rdsdiff/constants.go
- pkg/rdsdiff/download_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
- pkg/rdsdiff/policygen.go
- pkg/rdsdiff/compare.go
| func TestRunCompare_ExtractsAndWritesArtifacts(t *testing.T) { | ||
| root := t.TempDir() | ||
| oldGen := filepath.Join(root, "old-gen") | ||
| newGen := filepath.Join(root, "new-gen") | ||
| require.NoError(t, os.MkdirAll(oldGen, 0o750)) | ||
| require.NoError(t, os.MkdirAll(newGen, 0o750)) | ||
| require.NoError(t, os.WriteFile(filepath.Join(oldGen, "p.yaml"), []byte(minimalPolicyYAML), 0o600)) | ||
| require.NoError(t, os.WriteFile(filepath.Join(newGen, "p.yaml"), []byte(minimalPolicyYAML), 0o600)) | ||
|
|
||
| sessionDir := filepath.Join(root, "session") | ||
| require.NoError(t, os.MkdirAll(sessionDir, 0o750)) | ||
|
|
||
| summary, err := RunCompare(oldGen, newGen, sessionDir) | ||
| require.NoError(t, err) | ||
| assert.Contains(t, summary, "Comparison summary") | ||
| assert.FileExists(t, filepath.Join(sessionDir, "diff-report.txt")) | ||
| assert.FileExists(t, filepath.Join(sessionDir, "comparison.json")) | ||
| } |
There was a problem hiding this comment.
Assert the extracted CR outputs here, not just artifact existence.
This test can still pass if RunCompare skips the extraction step and compares the raw generated p.yaml files directly. Please assert at least one extracted per-CR file under sessionDir or inspect comparison.json for the expected CR keys (for example ConfigMap_openshift-monitoring_my-config and ImageContentSourcePolicy_my-icsp) so the end-to-end path is actually covered.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/rdsdiff/compare_test.go` around lines 411 - 428, The test
TestRunCompare_ExtractsAndWritesArtifacts currently only checks artifact
existence; update it to assert that RunCompare actually extracted per-CR outputs
by verifying either that specific extracted CR files exist under sessionDir
(e.g. files named like ConfigMap_openshift-monitoring_my-config.yaml and
ImageContentSourcePolicy_my-icsp.yaml) or by reading and unmarshalling
comparison.json and asserting it contains the expected CR keys (e.g.
"ConfigMap_openshift-monitoring_my-config" and
"ImageContentSourcePolicy_my-icsp"); ensure you reference RunCompare, sessionDir
and comparison.json when adding these assertions so the end-to-end extraction
path is covered.
| func SessionDir(workDir, requestID string) (string, error) { | ||
| if workDir == "" { | ||
| workDir = os.TempDir() | ||
| } | ||
| workDir = filepath.Clean(workDir) | ||
| if err := os.MkdirAll(workDir, 0o750); err != nil { | ||
| return "", fmt.Errorf("mkdir work dir: %w", err) | ||
| } | ||
| name := fmt.Sprintf("rds-diff-%s-%d", requestID, time.Now().Unix()) | ||
| sessionPath := filepath.Join(workDir, name) | ||
| if err := os.MkdirAll(sessionPath, 0o750); err != nil { | ||
| return "", fmt.Errorf("mkdir session: %w", err) | ||
| } | ||
| return sessionPath, nil | ||
| } |
There was a problem hiding this comment.
Unsanitized requestID can create unexpected directory structures.
requestID is used directly in the session directory name without validation. If it contains path separators (e.g., foo/bar) or special characters, os.MkdirAll will create nested directories rather than a single session folder. This inconsistency with the strict validation in ResolveSessionPath could cause confusion or allow unintended directory layouts.
Consider applying similar validation (reject /, \, .., or empty values) to requestID before constructing the session name.
🛡️ Proposed fix
func SessionDir(workDir, requestID string) (string, error) {
if workDir == "" {
workDir = os.TempDir()
}
workDir = filepath.Clean(workDir)
+ requestID = strings.TrimSpace(requestID)
+ if requestID == "" || strings.ContainsAny(requestID, `/\`) || strings.Contains(requestID, "..") {
+ return "", errors.New("invalid request ID")
+ }
if err := os.MkdirAll(workDir, 0o750); err != nil {
return "", fmt.Errorf("mkdir work dir: %w", err)
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/rdsdiff/storage.go` around lines 16 - 30, The SessionDir function
constructs a directory name using the raw requestID which can include path
separators or special entries and cause nested or unexpected directories; update
SessionDir to validate/sanitize requestID (similar to ResolveSessionPath) by
rejecting empty values, any path separators ('/' or filepath.Separator,
backslash), and path traversal tokens like "..", returning an error if invalid,
before formatting the session name and creating the directory so only a single
session folder is created.
|
@Tal-or: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
For a new feature like this, can we add a markdown document to the docs/ showing how and why this will be used, maybe with examples? |
Sure thing, I'll add that, Thanks! |
|
The concern here is that we don't want to introduce anything telco-specific (or RDS-specific, since that it telco-specific), as this is intended to be a generic tool. If this is a generic feature, a document explaining its purpose, examples, and how it works with the existing features of this tool would be a great way to understand it. |
Add a reusable library for comparing two versions of a telco RDS (Reference Design Specification). Given two downloadable RDS sources (GitHub tree URLs or direct archive links), the library handles the full workflow: download and extract archives, run the PolicyGenerator binary to produce policies, extract individual CRs from the generated ACM Policy documents, normalize and diff the CRs, and produce both a human-readable text report and a structured JSON comparison summary.
This package is intended to be consumed by tools such as kube-compare-mcp, which can wrap it as a thin MCP tool handler without embedding domain logic.
NOTE:
At the moment the tool can perform diffs only for telco-ran RDS
Signed-off-by: Talor Itzhak titzhak@redhat.com
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
Summary by CodeRabbit
New Features
Tests