feat(patch_set): support git diff application#59
feat(patch_set): support git diff application#59weihanglo wants to merge 10 commits intobmwill:masterfrom
Conversation
| } | ||
| }; | ||
|
|
||
| // FIXME: error spans point at `diff --git` line, not the specific offending line |
There was a problem hiding this comment.
Leave it for future, I am lazy again to write new code 🙇🏾♂️.
There was a problem hiding this comment.
Yeah if there are explicit limitation or things that we know will be addressed in the future as long as we document the issue i'm fine with avoiding scope bloat on individual PRs
ffb8730 to
2515dd3
Compare
| // Can't use `--numstat` for GitDiff: it shows `-\t-\t` for both | ||
| // actual binary diffs AND pure binary renames (100% similarity). | ||
| // Parser correctly handles pure renames (rename headers, no binary content). | ||
| // | ||
| // Use `--raw` for total count, subtract actual binary markers from diff. | ||
| // | ||
| // TODO: once `--binary` is passed above, count ALL `--raw` | ||
| // entries — every file will have patch data (delta, literal, or text). | ||
| let raw = git(repo, &["diff", "--raw", parent, child]); | ||
| let total = raw.lines().filter(|l| !l.is_empty()).count(); | ||
| let type_changes = count_type_changes(&raw); | ||
| let binary = diff_output | ||
| .lines() | ||
| .filter(|l| l.starts_with("Binary files ") || l.starts_with("GIT binary patch")) | ||
| .count(); | ||
| skipped += binary; | ||
| total - binary + type_changes |
There was a problem hiding this comment.
This is a partial port of https://github.com/weihanglo/diffy/blob/e50f9286b95671b0f131fb5ac4f5725933e78232/tests/replay.rs#L292-L304. We will remove the binary filter eventually to avoid duplicate the parsing logic in test when we add binary patch support.
eaa47ff to
49376b2
Compare
`.lines()` strips line endings, so callers tracking byte offsets need to re-add the `\r\n` or `\n` length manually. Extract the repeated inline pattern into a reusable helper.
* Parse `diff --git` extended headers * split multi-file git diffs at `diff --git` boundaries
Compat test for also `git apply`.
Unlike unidiff, gitdiff produces patches for empty file creations/deletions (`0\t0` in numstat) because they carry `diff --git` + extended headers even without hunks. Binary files (`-\t-\t`) are skipped in gitdiff mode for now.
| /// Skip binary diffs silently. | ||
| pub fn skip_binary(mut self) -> Self { | ||
| self.binary = Binary::Skip; | ||
| self | ||
| } |
There was a problem hiding this comment.
This is not good actually. We may want a binary marker like patch so people explicitly know they are skipping something
| /// * `a/<path> b/<path>` (default prefix) | ||
| /// * `<path> <path>` (`git diff --no-prefix`) | ||
| /// * `<src-prefix><path> <dst-prefix><path>` (custom prefix) | ||
| /// * `"<prefix><path>" "<prefix><path>"` (quoted, with escapes) |
There was a problem hiding this comment.
We should add more compat tests around this logic.
| return Ok(FileOperation::Rename { | ||
| from: Cow::Borrowed(from), | ||
| to: Cow::Borrowed(to), | ||
| }); | ||
| } | ||
| if let (Some(from), Some(to)) = (header.copy_from, header.copy_to) { | ||
| return Ok(FileOperation::Copy { | ||
| from: Cow::Borrowed(from), | ||
| to: Cow::Borrowed(to), |
There was a problem hiding this comment.
We might return the quoted filename. This need to be fixed.
| fn extract_file_op_gitdiff<'a>( | ||
| header: &GitHeader<'a>, | ||
| patch: &Patch<'a, str>, | ||
| ) -> Result<FileOperation<'a>, PatchSetParseError> { |
There was a problem hiding this comment.
Pre-existing issue: This FileOperation preserves the raw path with prefix unstripped, e.g., a/src/lib.rs and b/src/lib.rs. I personally think this is the right chose on syntactic level and consumer should know their patch better than us. However, the API doc should call this out more explicitly.
There was a problem hiding this comment.
Also note that we have yet supported non-UTF8 path even in #64.
There was a problem hiding this comment.
Also note that we have yet supported non-UTF8 path
This was one other thing i was going to mention. How do you want to handle that? punt on that for a follow up PR?
There was a problem hiding this comment.
Can also cherry pick whatever in #64, if that is preferred.
| Binary::Skip => { | ||
| return self.next_gitdiff_patch(); | ||
| } |
There was a problem hiding this comment.
This is recursive and some inputs could trivially overflow the stack. This should hopefully be fairly easy to restructure into a looping construct instead which can avoid the possibility of blowing the stack.
There was a problem hiding this comment.
Or better, instead of silent skip, we always emit a binary marker patch.
| } | ||
| // Select split with longest common path suffix (matches Git behavior) | ||
| if let Some(path) = longest_common_path_suffix(left, right) { | ||
| if path.len() > longest_path.len() { |
There was a problem hiding this comment.
If there are multiple splits with the same length what does git do in those situations? as-is this will prefer the first one we encounter.
| /// Path component boundary means: | ||
| /// | ||
| /// * At `/` character (e.g., `foo/bar.rs` vs `fooo/bar.rs` → `bar.rs`) | ||
| /// * Or the entire string is identical |
There was a problem hiding this comment.
The "entire string is identical" case takes care of when a and b have no/s correct?
| // Strip email preamble once at construction | ||
| let input = strip_email_preamble(input); |
There was a problem hiding this comment.
Not new in this change but I was just wondering how we would properly handle mbox streams (a concatenation of a bunch of email patchsets). Maybe worth adding a comment to comeback and address in a followup?
There was a problem hiding this comment.
I remembered last time I tried, GNU patch and git apply both failed on this case. We now have compat test infra we can verify with some tests.
| } | ||
|
|
||
| /// See [`parse_diff_git_path`]. | ||
| fn parse_quoted_diff_git_path(line: &str) -> Option<(Cow<'_, str>, Cow<'_, str>)> { |
There was a problem hiding this comment.
Can we add a test for quote-within-quote eg diff --git "a/with\"quote"
| let end = loop { | ||
| match bytes.get(i)? { | ||
| b'"' => break i + 1, | ||
| b'\\' => i += 2, |
There was a problem hiding this comment.
Do we need to be concerned about full octal awareness here? If we don't then maybe we should add a comment indicating why its ok?
Octal escape \377 decodes to 0xFF, which is not valid UTF-8. When parsing into `Patch<'_, str>`, `convert_cow_to_str` panics via `unwrap()` instead of returning a parse error. This documents the pre-existing bug that the reviewer flagged: bmwill#59 (comment)
Octal escape \377 decodes to 0xFF, which is not valid UTF-8. When parsing into `Patch<'_, str>`, `convert_cow_to_str` panics via `unwrap()` instead of returning a parse error. See bmwill#59 (comment)
Octal escape \377 decodes to 0xFF, which is not valid UTF-8. When parsing into `Patch<'_, str>`, `convert_cow_to_str` panics via `unwrap()` instead of returning a parse error. See #59 (comment)
Add git diff output parsing and application support. Some hightlights:
Patch, making skipping preamble and rejecting hunks configurable. Optional preamble skipping is required forgit diffparsing becausegit diffpatch may have only git headers without any unidiff portion, followed by the nextdiff --githeader. If we still skip preamble, the nextdiff --gitwill be swallowed.Keepenum variant when we add the support the binary patch parsing/application. (See https://github.com/weihanglo/diffy/blob/4fc7c4f0e4bc6db55da6a8eb92f4cdf20113225c/src/patches/parse.rs#L172-L197)