Skip to content

feat(doc,drive): normalize curly quotes and CRLF in --selection-with-ellipsis#621

Open
herbertliu wants to merge 2 commits intomainfrom
feat/doc-selection-normalization
Open

feat(doc,drive): normalize curly quotes and CRLF in --selection-with-ellipsis#621
herbertliu wants to merge 2 commits intomainfrom
feat/doc-selection-normalization

Conversation

@herbertliu
Copy link
Copy Markdown
Collaborator

@herbertliu herbertliu commented Apr 23, 2026

Summary

Addresses Case 7 in the lark-cli pitfall list: user-typed --selection-with-ellipsis values frequently come from pasted prose where auto-correction has rewritten straight quotes to curly quotes (U+2018/2019/201C/201D) or inserted CRLF. The Lark docx store keeps punctuation canonical (straight ASCII, LF), and server-side matching is strict byte comparison — a single curly quote defeats an otherwise correct selection. Agents currently have to retry with guesswork.

Normalize the deterministic-safe subset at the CLI layer before sending to MCP.

Changes

  • shortcuts/common/selection_normalize.go (new): NormalizeSelectionWithEllipsis rewrites curly quotes → straight, CRLF / CR → LF; returns (normalized, changed). CJK punctuation and full-width Latin are deliberately left alone since those appear verbatim in real document prose.
  • shortcuts/doc/docs_update.go: apply in both DryRun and Execute paths; Execute emits a note: to stderr when normalization actually changed the input so the user sees what was rewritten.
  • shortcuts/doc/doc_media_insert.go: apply silently at the two call sites (existing locate-doc step surfaces feedback on miss).
  • shortcuts/drive/drive_add_comment.go: apply in dry-run and Execute; Execute emits the same note:.
  • shortcuts/common/selection_normalize_test.go: 10-case table-driven coverage (empty, ASCII-only, curly singles/doubles, CRLF, standalone CR, CJK punctuation left alone, full-width Latin left alone).

Scope notes

  • The helper is conservative by design — see the GoDoc for the rationale on why CJK/full-width are not rewritten.
  • No new flag; normalization is always-on. If a caller ever needs to match a document with literal CR bytes, the behavior is documented and we can add an escape hatch later.
  • --selection-by-title is not normalized — it's parsed as markdown heading syntax, and the ATX heading check from PR feat(doc): warn when replace_range targets a heading but markdown lacks matching heading #619 already handles the conservative whitespace cases there.

Test Plan

  • go test ./shortcuts/common/... passes (new helper tests)
  • go test ./shortcuts/... passes
  • go vet ./... clean
  • gofmt -l clean

Summary by CodeRabbit

  • New Features

    • Selection text normalization now converts typographic curly quotes to ASCII equivalents and normalizes line endings (CRLF/CR → LF) for more reliable matching.
  • Improvements

    • Commands that use selection matching now apply the normalized text; when normalization alters input, a diagnostic note is printed during execution to explain the rewrite.

…ellipsis before matching

User-typed --selection-with-ellipsis values frequently come from pasted
prose that auto-correction has rewritten to curly quotes (U+2018/2019/
201C/201D) or CRLF line endings, while the Lark docx store keeps
punctuation in straight ASCII / LF form. Server-side matching is strict
byte comparison, so a single curly quote defeats an otherwise correct
selection and the Agent has to retry with guesswork (Case 7 in the
lark-cli pitfall list).

Add shortcuts/common.NormalizeSelectionWithEllipsis that rewrites only
the transformations that are virtually always safe (curly→straight
quotes and CR→LF). CJK punctuation and full-width Latin are
intentionally left alone — those can legitimately appear verbatim in
document prose, so rewriting them would break otherwise-valid
selections.

Wire it into the three call sites that send --selection-with-ellipsis
to MCP:
- docs +update (DryRun silent, Execute emits a 'note:' to stderr when
  normalization happened so the user can see what was changed)
- docs +media-insert (silent — the existing locate-doc step already
  surfaces selection feedback on miss)
- drive +add-comment Execute (emits the same 'note:' as docs +update)

The helper lives in shortcuts/common so both doc and drive packages can
share it without duplication.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 23, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 917584aa-6dea-41a3-b92a-504b4646160c

📥 Commits

Reviewing files that changed from the base of the PR and between c9c37e5 and e7b7446.

📒 Files selected for processing (1)
  • shortcuts/common/selection_normalize_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • shortcuts/common/selection_normalize_test.go

📝 Walkthrough

Walkthrough

Adds a normalization utility for --selection-with-ellipsis that converts curly quotes to ASCII and normalizes CR/CRLF to LF, plus tests and integration into multiple command handlers which may emit a diagnostic when normalization occurs.

Changes

Cohort / File(s) Summary
Normalization Utility
shortcuts/common/selection_normalize.go, shortcuts/common/selection_normalize_test.go
New NormalizeSelectionWithEllipsis(s string) (string, bool) and table-driven tests. Converts curly single/double quotes to ASCII equivalents and normalizes CRLF/CR to LF; returns whether a rewrite occurred.
Doc Media Insert
shortcuts/doc/doc_media_insert.go
Use normalized selection (via NormalizeSelectionWithEllipsis) in both DryRun and Execute paths before matching/locating blocks.
Docs Update
shortcuts/doc/docs_update.go
Preprocess --selection-with-ellipsis with normalizer for dry-run and execute; emit a stderr diagnostic during Execute if normalization changed the input.
Drive Add Comment
shortcuts/drive/drive_add_comment.go
Normalize --selection-with-ellipsis before computing comment mode and orchestration; emit a diagnostic on Execute when normalization alters input.

Sequence Diagram(s)

sequenceDiagram
  participant User as "CLI User"
  participant CLI as "Command Handler"
  participant Norm as "NormalizeSelectionWithEllipsis"
  participant MCP as "MCP / Backend"
  rect rgba(200,230,255,0.5)
    User->>CLI: provide --selection-with-ellipsis
    CLI->>Norm: NormalizeSelectionWithEllipsis(input)
    Norm-->>CLI: normalizedSelection, changedFlag
    alt changedFlag == true
      CLI->>CLI: print diagnostic to stderr
    end
    CLI->>MCP: call with normalizedSelection
    MCP-->>CLI: response
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

size/M

Suggested reviewers

  • fangshuyu-768

Poem

🐰 I nibble curly quotes into straight little sticks,
Turn CRLF rivers into LF-friendly ticks,
Across four commands I hop with a cheer,
Matching selections tidy and clear,
Hooray — ASCII leaps, the rabbits appear!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed Title accurately summarizes the main change: normalizing curly quotes and CRLF in --selection-with-ellipsis across doc and drive commands.
Description check ✅ Passed Description comprehensively covers summary, detailed changes across multiple files, scope notes, and test plan with checkmarks confirming verification.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/doc-selection-normalization

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added domain/ccm PR touches the ccm domain size/L Large or sensitive change across domains or core paths labels Apr 23, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
shortcuts/drive/drive_add_comment.go (1)

244-249: Nit: variable name normalized is the changed bool, not the normalized string.

This swaps the semantic roles of selection and normalized compared to readers' expectations (and compared to the helper's own docstring which calls the second return value "a flag indicating whether any rewrite happened"). Rename to changed (or didNormalize) to match the helper and the sibling call in docs_update.go (line 115), which already uses changed.

♻️ Proposed rename
-		rawSelection := runtime.Str("selection-with-ellipsis")
-		selection, normalized := common.NormalizeSelectionWithEllipsis(rawSelection)
-		if normalized {
+		selection, changed := common.NormalizeSelectionWithEllipsis(runtime.Str("selection-with-ellipsis"))
+		if changed {
 			fmt.Fprintf(runtime.IO().ErrOut,
 				"note: normalized --selection-with-ellipsis (curly quotes / CR line endings rewritten to canonical ASCII form for matching)\n")
 		}

(rawSelection is unused after this snippet, so it can be dropped too.)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shortcuts/drive/drive_add_comment.go` around lines 244 - 249, The variable
naming is confusing: the second return from
common.NormalizeSelectionWithEllipsis is a boolean flag indicating whether a
rewrite happened, not the normalized string; rename the local `normalized`
variable to `changed` (or `didNormalize`) and adjust usage so the call becomes
something like `selection, changed :=
common.NormalizeSelectionWithEllipsis(runtime.Str("selection-with-ellipsis"))`,
then use `changed` in the if-check and drop the now-unused `rawSelection`
variable; reference the call to common.NormalizeSelectionWithEllipsis and the
local variables `selection` and `normalized` when making the rename to match the
helper's docstring and the sibling usage in docs_update.go.
shortcuts/doc/doc_media_insert.go (1)

112-113: Silent normalization here is inconsistent with docs +update / drive +add-comment.

The other two call sites emit a note: normalized --selection-with-ellipsis (...) on stderr when rewriting occurs; docs +media-insert discards the changed flag with _ at both sites. The PR summary flags this as intentional, but from a user's perspective the discrepancy is surprising: pasting curly quotes into --selection-with-ellipsis produces a helpful hint for two commands and silence for the third. Consider surfacing the same note for parity — it is a one-line stderr print and does not affect stdout JSON.

♻️ Proposed diff (Execute site, line 254)
-		selection, _ := common.NormalizeSelectionWithEllipsis(strings.TrimSpace(runtime.Str("selection-with-ellipsis")))
+		selection, selNormalized := common.NormalizeSelectionWithEllipsis(strings.TrimSpace(runtime.Str("selection-with-ellipsis")))
+		if selNormalized {
+			fmt.Fprintf(runtime.IO().ErrOut,
+				"note: normalized --selection-with-ellipsis (curly quotes / CR line endings rewritten to canonical ASCII form for matching)\n")
+		}

Also applies to: 254-254

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shortcuts/doc/doc_media_insert.go` around lines 112 - 113, The code in docs
+media-insert silently drops the normalization "changed" flag (call to
common.NormalizeSelectionWithEllipsis assigns to selection, _), causing it to
not emit the stderr note other commands do; change the assignment to capture the
boolean (e.g., selection, changed := common.NormalizeSelectionWithEllipsis(...))
and, after assignment, if changed then print the same stderr note used by docs
+update / drive +add-comment (a one-line note about normalized
--selection-with-ellipsis) so behavior is consistent.
shortcuts/doc/docs_update.go (1)

80-94: DryRun body diverges from Execute when normalization changes the input.

Execute sends the normalized value and prints a stderr note, but DryRun silently rewrites with _. Since dry-run is meant to show exactly what would be sent, that is consistent — however, a user passing curly quotes in --dry-run sees a rewritten payload with no indication that normalization occurred. Consider emitting the same stderr note in DryRun (stderr, so it doesn't corrupt stdout JSON), or at minimum Set("selection_normalized", true) on the dry-run envelope so the rewrite is visible to automation. Optional.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shortcuts/doc/docs_update.go` around lines 80 - 94, The dry-run path
currently rewrites selection-with-ellipsis via
common.NormalizeSelectionWithEllipsis but emits no indication of that change;
update the DryRun payload builder (the code using common.NewDryRunAPI(),
POST(...), Body(...).Set("mcp_tool","update-doc").Set("args", args)) to signal
normalization—after calling common.NormalizeSelectionWithEllipsis and assigning
args["selection_with_ellipsis"]=normalized, if the normalized value differs from
the original runtime.Str("selection-with-ellipsis") then either write the same
stderr note used by Execute or (simpler) call Set("selection_normalized", true)
on the dry-run envelope so consumers see the rewrite. Ensure you reference the
runtime.Str("selection-with-ellipsis") original string to compare against the
normalized value before setting the flag.
shortcuts/common/selection_normalize.go (1)

27-38: Optional: collapse the six passes into a single strings.NewReplacer.

Each strings.ReplaceAll performs a full scan/allocation; a single strings.NewReplacer(...).Replace(s) does one pass and is the idiomatic Go choice for multi-literal substitution. Given these inputs are typically short, this is purely a readability/micro-perf nit.

♻️ Proposed refactor
-	out := s
-	// Curly single quotes → ASCII apostrophe.
-	out = strings.ReplaceAll(out, "\u2018", "'")
-	out = strings.ReplaceAll(out, "\u2019", "'")
-	// Curly double quotes → ASCII double quote.
-	out = strings.ReplaceAll(out, "\u201C", "\"")
-	out = strings.ReplaceAll(out, "\u201D", "\"")
-	// CRLF / standalone CR → LF. Lark stores LF internally; sending CRLF in
-	// a selection would require the document to contain literal CR bytes,
-	// which it never does.
-	out = strings.ReplaceAll(out, "\r\n", "\n")
-	out = strings.ReplaceAll(out, "\r", "\n")
+	// Order matters: "\r\n" must precede "\r" so CRLF collapses to a single LF.
+	out := selectionNormalizer.Replace(s)

And at package scope:

var selectionNormalizer = strings.NewReplacer(
    "\u2018", "'", "\u2019", "'",
    "\u201C", "\"", "\u201D", "\"",
    "\r\n", "\n", "\r", "\n",
)

strings.NewReplacer applies replacements in argument order, giving earlier patterns precedence when they match at the same position, so \r\n (appearing first) takes precedence over \r — equivalent to the current two-step CR/LF handling.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shortcuts/common/selection_normalize.go` around lines 27 - 38, Replace the
six separate strings.ReplaceAll calls in selection_normalize.go with a single
strings.NewReplacer to avoid multiple full-string scans; define a package-scope
variable (e.g., selectionNormalizer) initialized with the pairs "\u2018","'",
"\u2019","'", "\u201C","\"", "\u201D","\"", "\r\n","\n", "\r","\n" (ensure
"\r\n" appears before "\r" to preserve precedence) and change the function that
currently sets out := s and calls ReplaceAll to use
selectionNormalizer.Replace(s) instead.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@shortcuts/common/selection_normalize_test.go`:
- Around line 23-28: The test case name "ascii-only selection is untouched" in
selection_normalize_test.go is misleading because the input is CJK; rename that
table entry's name to something like "cjk-only selection is untouched" to
reflect the input, and also add a separate genuine ASCII-only no-op case (e.g.,
input "hello world", want "hello world", wantChanged false) to the same test
table so ASCII coverage is explicit; update only the table entry's name and/or
add the new case in the test data (no other code logic changes).

---

Nitpick comments:
In `@shortcuts/common/selection_normalize.go`:
- Around line 27-38: Replace the six separate strings.ReplaceAll calls in
selection_normalize.go with a single strings.NewReplacer to avoid multiple
full-string scans; define a package-scope variable (e.g., selectionNormalizer)
initialized with the pairs "\u2018","'", "\u2019","'", "\u201C","\"",
"\u201D","\"", "\r\n","\n", "\r","\n" (ensure "\r\n" appears before "\r" to
preserve precedence) and change the function that currently sets out := s and
calls ReplaceAll to use selectionNormalizer.Replace(s) instead.

In `@shortcuts/doc/doc_media_insert.go`:
- Around line 112-113: The code in docs +media-insert silently drops the
normalization "changed" flag (call to common.NormalizeSelectionWithEllipsis
assigns to selection, _), causing it to not emit the stderr note other commands
do; change the assignment to capture the boolean (e.g., selection, changed :=
common.NormalizeSelectionWithEllipsis(...)) and, after assignment, if changed
then print the same stderr note used by docs +update / drive +add-comment (a
one-line note about normalized --selection-with-ellipsis) so behavior is
consistent.

In `@shortcuts/doc/docs_update.go`:
- Around line 80-94: The dry-run path currently rewrites selection-with-ellipsis
via common.NormalizeSelectionWithEllipsis but emits no indication of that
change; update the DryRun payload builder (the code using common.NewDryRunAPI(),
POST(...), Body(...).Set("mcp_tool","update-doc").Set("args", args)) to signal
normalization—after calling common.NormalizeSelectionWithEllipsis and assigning
args["selection_with_ellipsis"]=normalized, if the normalized value differs from
the original runtime.Str("selection-with-ellipsis") then either write the same
stderr note used by Execute or (simpler) call Set("selection_normalized", true)
on the dry-run envelope so consumers see the rewrite. Ensure you reference the
runtime.Str("selection-with-ellipsis") original string to compare against the
normalized value before setting the flag.

In `@shortcuts/drive/drive_add_comment.go`:
- Around line 244-249: The variable naming is confusing: the second return from
common.NormalizeSelectionWithEllipsis is a boolean flag indicating whether a
rewrite happened, not the normalized string; rename the local `normalized`
variable to `changed` (or `didNormalize`) and adjust usage so the call becomes
something like `selection, changed :=
common.NormalizeSelectionWithEllipsis(runtime.Str("selection-with-ellipsis"))`,
then use `changed` in the if-check and drop the now-unused `rawSelection`
variable; reference the call to common.NormalizeSelectionWithEllipsis and the
local variables `selection` and `normalized` when making the rename to match the
helper's docstring and the sibling usage in docs_update.go.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a130595e-8de0-40f2-8e55-237d868ad6c0

📥 Commits

Reviewing files that changed from the base of the PR and between 6b7263a and c9c37e5.

📒 Files selected for processing (5)
  • shortcuts/common/selection_normalize.go
  • shortcuts/common/selection_normalize_test.go
  • shortcuts/doc/doc_media_insert.go
  • shortcuts/doc/docs_update.go
  • shortcuts/drive/drive_add_comment.go

Comment thread shortcuts/common/selection_normalize_test.go
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 23, 2026

Codecov Report

❌ Patch coverage is 64.70588% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 60.05%. Comparing base (bc6590a) to head (e7b7446).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
shortcuts/doc/docs_update.go 0.00% 8 Missing ⚠️
shortcuts/drive/drive_add_comment.go 42.85% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #621   +/-   ##
=======================================
  Coverage   60.05%   60.05%           
=======================================
  Files         406      407    +1     
  Lines       43089    43116   +27     
=======================================
+ Hits        25877    25895   +18     
- Misses      15194    15202    +8     
- Partials     2018     2019    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 23, 2026

🚀 PR Preview Install Guide

🧰 CLI update

npm i -g https://pkg.pr.new/larksuite/cli/@larksuite/cli@e7b7446ae347f1d7ff43a534775c5ce1bfe156b5

🧩 Skill update

npx skills add larksuite/cli#feat/doc-selection-normalization -y -g

CodeRabbit pointed out the 'ascii-only selection is untouched' case
actually had a CJK input ('欢迎大家多给反馈'), which was misleading.
Rename that entry to 'cjk-only selection is untouched' and add a
separate ASCII-only case with 'hello world' so the table now covers
both no-op scripts explicitly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain/ccm PR touches the ccm domain size/L Large or sensitive change across domains or core paths

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant