Render ScheduleWakeup and Cron* tools (#148)#152
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds ScheduleWakeup, CronCreate, CronList, and CronDelete tool support across models, regex-based output parsers and registrations, HTML/Markdown formatters and renderer dispatch, emoji-range adjustment, CSS/snapshots, renderer cross-linking for cron backlinks, and tests with a JSONL fixture. ChangesScheduling & Cron Tool Support Pipeline
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@claude_code_log/factories/tool_factory.py`:
- Around line 829-838: The code treats any trailing bracketed text captured in
m.group("flags") as flags and strips it from the prompt, which drops legitimate
prompt suffixes; to fix, validate and parse m.group("flags") before removing it
from the prompt: inspect the tokens derived in flag_tokens and only
consume/strip bracketed text when at least one token matches known flags (e.g.,
"recurring", "durable"); otherwise leave the original prompt intact when
constructing the CronListItem (update the prompt assignment and the
recurring/durable logic that reads flag_tokens so prompt suffixes that are not
real flags are preserved).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 215a9a84-6704-4a67-88c2-58f66308ae57
📒 Files selected for processing (7)
claude_code_log/factories/tool_factory.pyclaude_code_log/html/renderer.pyclaude_code_log/html/tool_formatters.pyclaude_code_log/markdown/renderer.pyclaude_code_log/models.pytest/test_cron_rendering.pytest/test_data/cron_tools.jsonl
|
(Claude) Status update on the latest CodeRabbit review: One actionable finding applied — Real bug in Fix follows CR's suggestion: validate captured flags against a known-flags whitelist ( Two new regression tests in
While I was in the parser, also folded in a stylistic micro-nit from the earlier monk review on this branch (msg #2779): CI on alice (pre-push): Thanks — the bracketed-suffix catch is a nice one. The original parser shape was overly trusting of the |
CR caught a real bug in ``parse_cronlist_output``: the optional
``[flags]`` group at the end of ``_CRON_LIST_LINE_RE`` greedily
captured *any* trailing bracketed text, including legitimate prompt
suffixes. A row like::
- cj_xyz: 0 9 * * * => /loop tick [important context]
parsed with ``prompt='/loop tick'`` and ``flags='important context'``
— silently dropping the suffix and producing nonsense recurring /
durable values.
## Fix
Validate captured flags against a known-flags whitelist
(``_KNOWN_CRON_FLAGS = {'recurring', 'durable'}``). When none of the
captured tokens match a known flag, restore the bracketed content to
the prompt and treat the row as flag-less.
When at least one token matches a known flag, accept the bracket as
flags (CR explicitly endorsed this trade-off — unknown stragglers
inside an otherwise-known flag list are dropped). The whitelist
keeps the contract honest: only documented flags are real flags.
## Folded in: monk's #148 micro-nit
While in the parser, replaced ``True or None`` / ``False or None``
with the more self-documenting ``True if X else None`` per monk's
review nit on #148 (msg #2779). Same behaviour, clearer intent.
## Tests
Two new cases in ``TestSchedulingOutputParsers``:
- ``test_cronlist_preserves_bracketed_prompt_suffix`` — pins the
exact CR scenario; pre-fix would have failed by stripping
``[important context]`` from the prompt.
- ``test_cronlist_keeps_real_flags_alongside_known_tokens`` —
confirms recognised flags still parse correctly when the prompt
body itself contains other punctuation.
CI: 26/26 ``test_cron_rendering.py`` pass; ruff + pyright clean.
Three concrete improvements after rendering against a real-data sample (live CronCreate / CronList / CronDelete experiment in a session, captured before this commit): ## 1. Fix the wrench prefix on ⏰-titled tools (``starts_with_emoji``) The transcript template prepends ``🛠️`` to a tool-use title unless the title already starts with an emoji recognised by ``html/utils.py::starts_with_emoji``. That function whitelists specific Unicode ranges, but the ranges miss the **Misc Technical** block (U+2300–U+23FF) where ``⏰`` (U+23F0) lives. So titles like ``⏰ ScheduleWakeup …`` rendered as ``🛠️ ⏰ ScheduleWakeup …``. Added the Misc Technical range to the whitelist (along with a docstring callout listing all glyphs that benefit: ``⏰ ⏳ ⏱️ ⏲️ ⏸ ⏹ ⏺ ⏏``). Other tools using emoji from this block (currently none in the codebase, but several candidates in the range) get the suppression for free. ## 2. ScheduleWakeup / CronCreate body = prompt only Both tools' titles already carry the most informative scalars (``+<delay>s — <reason>`` for ScheduleWakeup, ``<cron>`` for CronCreate). The body's previous params-table repeated the same values plus a ``recurring`` / ``durable`` row that the harness's own confirmation echoes back in human-readable form (``Scheduled recurring job <id> ... Session-only ...``). Drop the table entirely; render just the prompt as collapsible code via the existing ``_format_collapsible_prompt`` helper. The ``_row_html`` helper that backed the table is now dead — removed. ## 3. Parser regexes match the harness's actual output format A live experiment (CronCreate every 2 minutes, CronList, CronDelete within minutes) showed my synthetic fixtures and parser regexes were misaligned with what the harness really emits: - **CronCreate**: ``Scheduled cron job <id>`` → real: ``Scheduled recurring job <id> (Every 2 minutes). Session-only…`` (or ``one-shot`` for non-recurring). Updated regex to accept both ``recurring`` and ``one-shot`` kinds. - **CronList row**: ``- <id>: <cron> => <prompt> [flags]`` → real: ``<id> — <description> (recurring) [session-only]: <prompt>…``. No leading hyphen, em-dash separator, parens for the kind, brackets for the scope, no cron expression (only a human-readable ``description``), prompt often truncated by the harness with ``…``. Regex completely rewritten; ``CronListItem.cron`` field renamed to ``description`` to reflect the actual content. - **CronDelete**: ``Deleted cron job <id>`` → real: ``Cancelled job <id>.``. Parser unchanged (still text-verbatim); fixture and tests updated. The previous fix for the bracketed-prompt-suffix bug (CR finding on PR #152) and the ``_KNOWN_CRON_FLAGS`` whitelist are dropped because the new format separates flags (parens) from prompts (after a colon) — there's no greedy-bracket vulnerability anymore. Fixture ``test/test_data/cron_tools.jsonl`` updated with real harness text. Tests rewritten against the actual format with two extra edge cases: - ``test_croncreate_extracts_job_id_oneshot`` — confirms the ``one-shot`` kind variant. - ``test_cronlist_durable_scope_sets_durable_flag`` — confirms the durable scope (separate from the recurring/one-shot kind). ## 4. tool-renderer skill updated Added a "Watch out: the template's wrench-suppression is emoji-range-gated" section under "Add Title Method" with the full list of recognised Unicode ranges and the recipe for verifying your icon (render fixture + grep for ``🛠️`` co-occurring) / extending the range when your icon falls outside. ## CI - ``just test``: 1664 pass, 7 skipped (was 1659 with the prior bracketed-suffix tests; the net +5 reflects the test rewrite — removed 2 obsolete bracket-handling tests, added 5 real-format tests including the wrench-suppression regression guard). - ruff + pyright clean. - Fixture rendered locally to verify: ``🛠️ ⏰`` co-occurrence is zero (the suppression works on the real fixture).
|
(Claude) Polish round on user feedback — Three concrete changes after rendering against real-data fixtures (live 1. Wrench prefix gone from ⏰-titled tools. The transcript template prepends 2. ScheduleWakeup / CronCreate body = prompt only. Both tools' titles already carry the most informative scalars ( 3. Parser regexes updated to match real harness output. The live experiment showed my synthetic fixtures and parser regexes were misaligned with what the harness actually emits:
The previous fix for the bracketed-prompt-suffix bug (CR finding earlier on this PR) and the Bonus: tool-renderer skill updated. Added a "Watch out: the template's wrench-suppression is emoji-range-gated" section listing all recognised Unicode ranges and the verification recipe. Future tools using emoji from outside the whitelist will hit the same trap I did, so the skill now warns about it explicitly. CI: 1664 unit pass, ruff + pyright clean. Test count net change reflects the rewrite (removed 2 obsolete bracket-handling tests, added 5 real-format tests including a wrench-suppression regression guard). Fixture rendered locally to verify: |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
test/test_cron_rendering.py (1)
143-180: ⚡ Quick winAdd a mixed-format CronList regression case.
These tests cover all-valid and all-invalid CronList text, but not a mixed payload (one valid row + one drifted row). That case is important to prevent partial-table rendering from hiding unparsed content.
Suggested test addition
class TestSchedulingOutputParsers: @@ def test_cronlist_falls_back_on_unrecognised_format(self) -> None: out = parse_cronlist_output( _result("Just a free-form summary, no rows here."), None ) assert out is not None assert out.jobs == [] assert "free-form summary" in out.text + + def test_cronlist_mixed_lines_falls_back_to_raw_text(self) -> None: + text = "\n".join( + [ + "abc — Daily at 9am (recurring) [durable]: /morning-checkin", + "unexpected trailer line that does not match cron row format", + ] + ) + out = parse_cronlist_output(_result(text), None) + assert out is not None + assert out.jobs == [] + assert "unexpected trailer line" in out.text🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/test_cron_rendering.py` around lines 143 - 180, Add a regression test to test_cron_rendering.py that feeds parse_cronlist_output a mixed payload containing one valid cron row and one unrecognised/free-form line to ensure partial parsing doesn't drop unparsed content; use the existing helper _result(...) to wrap the string, call parse_cronlist_output, assert the returned value is a CronListOutput with one parsed job (check job.id/job.description or other fields) and that out.text still contains the unparsed free-form line, and mirror the style of existing tests like test_cronlist_parses_real_format and test_cronlist_falls_back_on_unrecognised_format so it fits with parse_cronlist_output and CronListOutput expectations.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@claude_code_log/factories/tool_factory.py`:
- Around line 838-853: The parser currently skips non-empty lines that don't
match _CRON_LIST_LINE_RE, which causes partial parses to hide rows; change the
loop that builds jobs so unmatched non-empty lines are preserved instead of
ignored—when m is None and line.strip() is non-empty, append a CronListItem (or
similar sentinel) that preserves the original line (e.g., id=None,
description=line.strip(), and a parsing_error or raw=True flag) so the UI can
show those rows or fall back to text mode; ensure CronListOutput(text=text,
jobs=jobs) still returns all lines and that any new fields on CronListItem are
optional to avoid breaking downstream consumers.
---
Nitpick comments:
In `@test/test_cron_rendering.py`:
- Around line 143-180: Add a regression test to test_cron_rendering.py that
feeds parse_cronlist_output a mixed payload containing one valid cron row and
one unrecognised/free-form line to ensure partial parsing doesn't drop unparsed
content; use the existing helper _result(...) to wrap the string, call
parse_cronlist_output, assert the returned value is a CronListOutput with one
parsed job (check job.id/job.description or other fields) and that out.text
still contains the unparsed free-form line, and mirror the style of existing
tests like test_cronlist_parses_real_format and
test_cronlist_falls_back_on_unrecognised_format so it fits with
parse_cronlist_output and CronListOutput expectations.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6023af08-af5b-4436-9937-1b5771245fc6
📒 Files selected for processing (8)
.claude/skills/tool-renderer/SKILL.mdclaude_code_log/factories/tool_factory.pyclaude_code_log/html/tool_formatters.pyclaude_code_log/html/utils.pyclaude_code_log/markdown/renderer.pyclaude_code_log/models.pytest/test_cron_rendering.pytest/test_data/cron_tools.jsonl
✅ Files skipped from review due to trivial changes (2)
- .claude/skills/tool-renderer/SKILL.md
- test/test_data/cron_tools.jsonl
🚧 Files skipped from review as they are similar to previous changes (2)
- claude_code_log/models.py
- claude_code_log/markdown/renderer.py
|
(Claude) Round-4 — Applied the bail-out approach: Blank lines are still skipped without bailing (carve-out for whitespace; verified by a separate test). Skipped the alternative actionable suggestion (preserve unparsed rows as sentinel CR's exact suggested test ( CI: 29/29 in |
|
(Claude) Visual polish round on user feedback after rendering against the real fixture — 1. ScheduleWakeup / CronCreate prompts render as Markdown. The prompt is typically Markdown content (slash commands, inline code, prose) rather than preformatted text. HTML side switches from 2. Light borders + padded cells on CronList table. Default unstyled tables read as raw rows pasted into the page; new CSS class 3. Cross-link CronList row ids and CronDelete result ids back to the originating CronCreate card. New renderer pass Same pattern as PR #147's Monitor → TaskNotification backlink, but in the reverse direction (consumer → originating call) and matched by job id rather than tool_use_id. Markdown side keeps the inline-code id without a clickable link (per the established finding from #147 — Markdown only has session-level anchors, no message-level targets). 4. CronDelete parser extracts the cancelled job id so the cross-link pass has a key. New The cross-link affordance is subtle (dotted underline, inherited colour, accent on hover) — visible to anyone scanning for affordances but doesn't draw attention away from the structured content. Tests: 2 new cross-link assertions ( Snapshot drift is 308 lines added across 8 fixtures — all CSS additions for the new CI: 1667 unit pass, ruff + pyright clean. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@claude_code_log/html/templates/components/message_styles.css`:
- Line 485: The CSS uses the keyword value "currentColor" which violates the
stylelint rule expecting lowercase; update the declaration that sets
border-bottom: 1px dotted currentColor; to use the lowercase keyword
"currentcolor" (i.e., change currentColor → currentcolor) so the value casing
matches lint rules and other declarations that reference the same keyword.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d120c426-63d9-4670-804c-b4552ad6f9be
📒 Files selected for processing (8)
claude_code_log/factories/tool_factory.pyclaude_code_log/html/templates/components/message_styles.cssclaude_code_log/html/tool_formatters.pyclaude_code_log/markdown/renderer.pyclaude_code_log/models.pyclaude_code_log/renderer.pytest/__snapshots__/test_snapshot_html.ambrtest/test_cron_rendering.py
🚧 Files skipped from review as they are similar to previous changes (1)
- claude_code_log/factories/tool_factory.py
The built-in scheduling tool family — ``ScheduleWakeup``, ``CronCreate``, ``CronList``, ``CronDelete`` — used to fall through to the generic params-table render. Add specialised rendering across HTML and Markdown for all four, following the same recipe as the ``Monitor`` tool work in #142/PR #147. ## Models - ``ScheduleWakeupInput`` (Pydantic): ``delaySeconds`` / ``reason`` / ``prompt``. Used by ``/loop`` dynamic mode for self-paced ticks. - ``CronCreateInput`` (Pydantic): ``cron`` / ``prompt`` (required) + ``recurring`` / ``durable`` (optional). Schedules a prompt at a cron-matched time. - ``CronListInput`` (Pydantic, no fields): kept as an explicit model so dispatch routes through the typed-input pipeline rather than the generic fallback. - ``CronDeleteInput`` (Pydantic): ``id``. - Output dataclasses: ``ScheduleWakeupOutput`` (parsed clock / delay), ``CronCreateOutput`` (parsed job_id), ``CronListOutput`` (parsed list of ``CronListItem`` rows), ``CronDeleteOutput`` (text only). Each captures the raw text as fallback so the renderer always has something to show even when the harness's exact format drifts. ## Factory parsers Output parsers extract structured fields when the format matches and fall back gracefully when it doesn't. Char classes are tolerant (``[\\w-]+`` for ids per monk's #147 nit). The cron-list row regex allows the prompt body to contain ``=>`` and the optional ``[recurring, durable]`` flag tokens; lines that don't match are quietly skipped, leaving the raw text available as fallback. ## HTML formatters A small shared helper, ``_format_collapsible_prompt``, renders the ``prompt`` field with the same line-count-badge collapsibility as the Monitor tool's ``command`` field — multi-line / long prompts collapse via ``render_collapsible_code``, short single-line prompts render in a plain ``<pre>``. Used by both ``ScheduleWakeup`` and ``CronCreate``. Per-tool formatters compose: - ``format_schedulewakeup_input``: 3-row grid, prompt collapsible. - ``format_croncreate_input``: cron + optional flags + collapsible prompt. Optional flags only render when explicitly set (the harness's defaults are near-universal; showing ``recurring=True`` on every row would be noise). - ``format_cronlist_input``: empty body — title is sufficient. - ``format_cronlist_output``: structured table when ``jobs`` is populated, plain ``<pre>`` fallback otherwise. Prompt previews truncated at 80 chars in the list view since the full prompt lives in the originating ``CronCreate`` card upstream. - ``format_crondelete_input``: empty body — id is in the title. - All output formatters render the harness status line verbatim in a typed ``<div>`` (matching the Monitor output shape). ## HTML titles All four tools share the ``⏰`` (alarm clock) icon for the family. Per-tool summary in the title so the reader can identify the call without expanding: - ``⏰ ScheduleWakeup +<delay>s — <reason>`` - ``⏰ CronCreate <cron>`` - ``⏰ CronList`` - ``⏰ CronDelete <id>`` ## Markdown rendering Mirrors the HTML shape in plain Markdown: - Bullet list of scalar fields, prompt in a fenced ``_code_fence`` block (adaptive width, handles backticks). - Titles wrap user-supplied values in ``_inline_code`` per the same ``WebSearch`` / ``WebFetch`` / ``Skill`` / ``Monitor`` precedent — protects against markdown emphasis / heading metacharacters in reasons / cron expressions / ids. - ``CronList`` body renders structured rows as a bullet list when parseable, fenced raw text otherwise. ## Tests (24 new) ``test/test_cron_rendering.py``: - ``TestSchedulingInputModels`` (4) — the four input models. - ``TestSchedulingOutputParsers`` (8) — happy path + fallback for each parser, including the cron-list "unrecognised format" path that exercises the graceful-degradation branch. - ``TestSchedulingHtmlFormatters`` (6) — grid completeness, long-prompt collapsibility, optional-flag omission, structured- vs-fallback CronList rendering. - ``TestSchedulingFixtureRendering`` (6) — drives the real HTML/Markdown renderers against a JSONL fixture with one call per tool in the family. Asserts titles, grids, the structured cron list, and the markdown inline-code escaping for user-supplied values (reason, cron, id). Fixture ``test/test_data/cron_tools.jsonl``: ScheduleWakeup → CronCreate → CronList → CronDelete in sequence; ScheduleWakeup adapted from a real invocation in alice's transcripts, the Cron* calls constructed against the documented schemas (no real Cron* invocations exist in any of my transcripts yet — the tools are listed as deferred but haven't been used in the wild). ## CI - ``just test``: 1659 pass, 7 skipped (was 1635; +24 new). - ``just test-tui``: 67 pass. - ``just test-browser``: 39 pass, 1 skipped. - ruff + pyright clean on production files. - No snapshot drift — no existing fixtures carry these tools.
``title_CronListInput`` passed the tool name ("CronList") as the
``summary`` argument to ``_tool_title``. But ``_tool_title`` already
emits the message's tool name (``content.tool_name``) ahead of any
summary span — passing the same value as the summary therefore
rendered ``CronList`` twice in the HTML title:
🛠️ ⏰ CronList <span class='tool-summary'>CronList</span>
Drop the literal summary so the no-summary branch fires, matching
the docstring's intent and the Markdown title shape (``⏰ CronList``).
The other three tools in the family pass distinct summaries (delay,
cron expression, id), so they don't trigger the duplication — the
guard is specifically for zero-input tools whose title summary
would naturally be the tool name.
## Test tightening
The original assertion was ``"CronList" in html``, which is satisfied
by both the correct (one occurrence) and broken (two occurrences)
output. Tightened to:
- ``html.count("CronList") == 1`` — pins the cardinality.
- ``"<span class='tool-summary'>CronList" not in html`` — pins the
absence of the duplicate-summary shape specifically.
Both assertions catch the same regression class; either alone would
fail loudly if a future zero-input-tool title accidentally passes
its name as the summary.
CI: ``test_cron_rendering.py`` 24/24 still pass; ruff + pyright
clean.
CR caught a real bug in ``parse_cronlist_output``: the optional
``[flags]`` group at the end of ``_CRON_LIST_LINE_RE`` greedily
captured *any* trailing bracketed text, including legitimate prompt
suffixes. A row like::
- cj_xyz: 0 9 * * * => /loop tick [important context]
parsed with ``prompt='/loop tick'`` and ``flags='important context'``
— silently dropping the suffix and producing nonsense recurring /
durable values.
## Fix
Validate captured flags against a known-flags whitelist
(``_KNOWN_CRON_FLAGS = {'recurring', 'durable'}``). When none of the
captured tokens match a known flag, restore the bracketed content to
the prompt and treat the row as flag-less.
When at least one token matches a known flag, accept the bracket as
flags (CR explicitly endorsed this trade-off — unknown stragglers
inside an otherwise-known flag list are dropped). The whitelist
keeps the contract honest: only documented flags are real flags.
## Folded in: monk's #148 micro-nit
While in the parser, replaced ``True or None`` / ``False or None``
with the more self-documenting ``True if X else None`` per monk's
review nit on #148 (msg #2779). Same behaviour, clearer intent.
## Tests
Two new cases in ``TestSchedulingOutputParsers``:
- ``test_cronlist_preserves_bracketed_prompt_suffix`` — pins the
exact CR scenario; pre-fix would have failed by stripping
``[important context]`` from the prompt.
- ``test_cronlist_keeps_real_flags_alongside_known_tokens`` —
confirms recognised flags still parse correctly when the prompt
body itself contains other punctuation.
CI: 26/26 ``test_cron_rendering.py`` pass; ruff + pyright clean.
Three concrete improvements after rendering against a real-data sample (live CronCreate / CronList / CronDelete experiment in a session, captured before this commit): ## 1. Fix the wrench prefix on ⏰-titled tools (``starts_with_emoji``) The transcript template prepends ``🛠️`` to a tool-use title unless the title already starts with an emoji recognised by ``html/utils.py::starts_with_emoji``. That function whitelists specific Unicode ranges, but the ranges miss the **Misc Technical** block (U+2300–U+23FF) where ``⏰`` (U+23F0) lives. So titles like ``⏰ ScheduleWakeup …`` rendered as ``🛠️ ⏰ ScheduleWakeup …``. Added the Misc Technical range to the whitelist (along with a docstring callout listing all glyphs that benefit: ``⏰ ⏳ ⏱️ ⏲️ ⏸ ⏹ ⏺ ⏏``). Other tools using emoji from this block (currently none in the codebase, but several candidates in the range) get the suppression for free. ## 2. ScheduleWakeup / CronCreate body = prompt only Both tools' titles already carry the most informative scalars (``+<delay>s — <reason>`` for ScheduleWakeup, ``<cron>`` for CronCreate). The body's previous params-table repeated the same values plus a ``recurring`` / ``durable`` row that the harness's own confirmation echoes back in human-readable form (``Scheduled recurring job <id> ... Session-only ...``). Drop the table entirely; render just the prompt as collapsible code via the existing ``_format_collapsible_prompt`` helper. The ``_row_html`` helper that backed the table is now dead — removed. ## 3. Parser regexes match the harness's actual output format A live experiment (CronCreate every 2 minutes, CronList, CronDelete within minutes) showed my synthetic fixtures and parser regexes were misaligned with what the harness really emits: - **CronCreate**: ``Scheduled cron job <id>`` → real: ``Scheduled recurring job <id> (Every 2 minutes). Session-only…`` (or ``one-shot`` for non-recurring). Updated regex to accept both ``recurring`` and ``one-shot`` kinds. - **CronList row**: ``- <id>: <cron> => <prompt> [flags]`` → real: ``<id> — <description> (recurring) [session-only]: <prompt>…``. No leading hyphen, em-dash separator, parens for the kind, brackets for the scope, no cron expression (only a human-readable ``description``), prompt often truncated by the harness with ``…``. Regex completely rewritten; ``CronListItem.cron`` field renamed to ``description`` to reflect the actual content. - **CronDelete**: ``Deleted cron job <id>`` → real: ``Cancelled job <id>.``. Parser unchanged (still text-verbatim); fixture and tests updated. The previous fix for the bracketed-prompt-suffix bug (CR finding on PR #152) and the ``_KNOWN_CRON_FLAGS`` whitelist are dropped because the new format separates flags (parens) from prompts (after a colon) — there's no greedy-bracket vulnerability anymore. Fixture ``test/test_data/cron_tools.jsonl`` updated with real harness text. Tests rewritten against the actual format with two extra edge cases: - ``test_croncreate_extracts_job_id_oneshot`` — confirms the ``one-shot`` kind variant. - ``test_cronlist_durable_scope_sets_durable_flag`` — confirms the durable scope (separate from the recurring/one-shot kind). ## 4. tool-renderer skill updated Added a "Watch out: the template's wrench-suppression is emoji-range-gated" section under "Add Title Method" with the full list of recognised Unicode ranges and the recipe for verifying your icon (render fixture + grep for ``🛠️`` co-occurring) / extending the range when your icon falls outside. ## CI - ``just test``: 1664 pass, 7 skipped (was 1659 with the prior bracketed-suffix tests; the net +5 reflects the test rewrite — removed 2 obsolete bracket-handling tests, added 5 real-format tests including the wrench-suppression regression guard). - ruff + pyright clean. - Fixture rendered locally to verify: ``🛠️ ⏰`` co-occurrence is zero (the suppression works on the real fixture).
Partial-table failure mode: the per-line ``continue`` in ``parse_cronlist_output`` would silently drop unparsed lines from the structured output. A 9-of-10-rows CronList payload with one drifted format would render as a 9-row table, hiding the 10th line behind structured chrome. Tighten to all-or-nothing: if any non-empty line fails the row regex, return ``jobs=[]`` so the renderer's existing ``if not output.jobs:`` branch fires and shows the raw text. This matches the documented "structured-when-parseable, raw text otherwise" semantics — at the granularity of the whole list, not per-line. Blank lines are skipped without triggering the bail-out (they're noise, not data). The real harness output is single-line per row with no headers (verified against the live experiment), so the bail-out doesn't fire on real data today; it kicks in only when the format drifts in a way the parser doesn't recognise. ## Tests - ``test_cronlist_mixed_lines_falls_back_to_raw_text`` — pins the partial-parse bail-out behaviour (CR's nitpick test, mirrored verbatim). - ``test_cronlist_blank_lines_dont_bail_out`` — guards the blank- line carve-out so the bail-out doesn't trigger on whitespace. ## Skipped CR's actionable suggested an alternative: preserve unparsed rows as sentinel ``CronListItem`` entries with a ``raw=True`` flag. The bail-out approach is simpler (no new dataclass field, single return path) and matches the existing fallback semantics. Either shape addresses the partial-table issue; bail-out is the cheaper fix. CI: ``test_cron_rendering.py`` 29/29 pass; ruff + pyright clean.
User feedback after rendering against the real-data fixture: 1. **ScheduleWakeup / CronCreate prompts render as Markdown.** The prompt is typically Markdown content (slash commands, inline code, prose) rather than preformatted text. Switch HTML side from ``_format_collapsible_prompt`` (which wrapped in ``<pre>``) to ``render_markdown_collapsible``; switch Markdown side from ``self._code_fence(prompt)`` to emitting the prompt directly into the document so CommonMark renders it. Removed the now-unused ``_format_collapsible_prompt`` helper. 2. **Light borders + padded cells on CronList table.** Default unstyled tables read as raw rows pasted into the page; new CSS class ``.cronlist-output-table`` matches the visual weight of the sibling ``.system-content`` / ``.command-output-content`` panels (#137 lineage). 3. **Cross-link CronList row ids and CronDelete result ids back to the originating CronCreate card.** New renderer pass ``_link_cron_jobs_by_id`` builds a ``job_id → message_index`` map from CronCreate outputs (parsed by ``parse_croncreate_output``) and stamps ``creating_call_message_index`` on each consumer (``CronListItem``, ``CronDeleteOutput``). HTML formatters wrap the rendered id in an ``<a class='cron-id-backlink'>`` pointing at the card's anchor when the field is set; otherwise plain ``<code>`` as before. Same pattern as PR #147's Monitor → TaskNotification backlink, but in the reverse direction (consumer → originating call) and matched by job id rather than tool_use_id. Markdown side keeps the inline-code id without a clickable link (per the established finding from #147 — Markdown only has session-level anchors, no message-level targets). 4. **CronDelete parser extracts the cancelled job id.** Added ``_CRON_DELETE_ID_RE`` and ``CronDeleteOutput.job_id`` so the cross-link pass has a key to look up. Falls back to ``None`` for unrecognised formats; rendering degrades to plain status text. ## CronList CSS ```css .cronlist-output-table { margin-top: 8px; border: 1px solid #11; border-radius: 4px; background-color: #1e1e1e08; border-collapse: collapse; font-size: 0.9em; overflow: hidden; } .cronlist-output-table th, .cronlist-output-table td { padding: 6px 12px; border-bottom: 1px solid #11; } .cronlist-output-table tr:last-child td { border-bottom: none; } .cron-id-backlink { color: inherit; text-decoration: none; border-bottom: 1px dotted currentColor; } .cron-id-backlink:hover { color: var(--user-color); border-bottom-style: solid; } ``` The cross-link affordance is subtle (dotted underline, inherited colour) — visible to anyone scanning for affordances but doesn't draw attention away from the structured row content. ## Tests Updated assertions for the new shapes; added two cross-link tests: - ``test_html_cronlist_id_links_back_to_croncreate`` — pins the anchor href against the CronCreate's ``msg-d-N`` id. - ``test_html_crondelete_result_links_back_to_croncreate`` — pins the inline anchor wrapping the cancelled id within the status text. Replaced the brittle ``html.count("CronList") == 1`` assertion (CSS comments now legitimately mention the tool name) with the more specific ``"<span class='tool-summary'>CronList" not in html`` shape check — same regression class, no false positives from documentation strings. Snapshot drift: 308 lines added to ``test_snapshot_html.ambr``, all CSS additions across 8 fixtures (the new ``.cronlist-output- table`` and ``.cron-id-backlink`` rules). No content drift. ## CI - ``just test``: 1667 pass, 7 skipped (was 1664; +3 net from new tests minus the count assertion replacement). - ruff + pyright clean.
Rebased onto current ``main`` (which advanced to ``7c2e6f6 Style sidechain filter toggle with dashed border``). The new CSS rule embeds in every rendered HTML, so all 7 snapshots gain a +4-line addition for the ``.filter-toggle[data-type="sidechain"]`` rule. Pure CSS-rule drift; no content drift. CI on the previous push caught the rebase staleness — local snapshots had been updated against the older ``message_styles.css`` set, but CI generated HTML that included main's newer sidechain-filter CSS, producing the diff. Updating the snapshot file against the current main brings the two views back in sync.
7f4ff89 to
0045805
Compare
CodeRabbit's stylelint rule expects lowercase ``currentcolor`` (matching the CSS spec — keywords are case-insensitive but the project lints for the canonical lowercase form). One occurrence in the new ``.cron-id-backlink`` rule. Snapshot drift is the same change × 7 fixtures — pure casing flip, no semantic difference.
|
(Claude) The latest CR review's actionable ( |
Summary
Closes #148. Adds specialised rendering for the built-in scheduling tool family —
ScheduleWakeup,CronCreate,CronList,CronDelete— which previously fell through to the generic params-table render. Same per-tool-family pattern as theMonitortool work in #147.Two commits, kept separate for the post-mortem story:
<sha1>— core implementation (4-tool family rendering + 24 regression tests). Models, factory parsers (with graceful-degradation fallbacks), HTML/Markdown formatters, title methods, and a JSONL fixture covering one call per tool. Shared_format_collapsible_prompthelper betweenScheduleWakeupandCronCreate(both carry a long prompt field that benefits from the line-count-badge collapsibility used by Monitor'scommand).<sha2>— fix-up (CronList HTML title duplication).title_CronListInputoriginally passed"CronList"as the summary to_tool_title, which already emits the tool name fromcontent.tool_name— duplicating it in the HTML output. Fix is one line; tests tightened with a cardinality check (html.count("CronList") == 1) and a shape-specific check ("<span class='tool-summary'>CronList" not in html) to catch the same regression class via different mechanisms.Defensive scaffolding heads-up
ScheduleWakeupis the only tool with real fixtures derived from actual transcripts.CronCreate/CronList/CronDeletefixtures are constructed against the documented schemas — no real invocations exist in the project's transcripts yet (the tools are listed as deferred but haven't been used in the wild). The output parsers degrade gracefully: every output dataclass carries atextfield even when structured parsing succeeds, and renderers fall back to raw text whenever parsing yields no structured fields. So when real invocations show up and the format turns out to differ from the fixture, the rendering will visibly degrade to text rather than crash. Worth a follow-up to tighten the parsers when real Cron* output lands.Test plan
just test: 1661 pass, 7 skipped (was 1635 onmain; +26 new)just test-tui: 67 passjust test-browser: 39 pass, 1 skippedjust ci: ruff + pyright + ty all greentest/test_cron_rendering.py(24 cases):test/test_data/cron_tools.jsonl, asserting titles + grids + structured cron list + Markdown inline-code escaping)🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
UI / Rendering
Tests
Documentation