fix: improve todo completion reliability by trungutt · Pull Request #2048 · docker/docker-agent

trungutt · 2026-03-10T18:14:22Z

Problem

Agents create todos (e.g. 5 items), do the work, but only mark a subset as completed (e.g. 3 out of 5). The remaining items stay "pending" in the sidebar even though the underlying tasks were actually done.

This is an LLM attention problem. The CRUD model (update_todos with just an ID) lets the LLM update one item without ever seeing the rest. As the conversation grows and old tool results get truncated by the 40k-token budget, the LLM loses awareness of what it originally planned.

Changes

Full-state responses on every tool call. create_todo, create_todos, and update_todos now include an all_todos field containing the complete current todo list, plus a reminder listing any incomplete items. This means the LLM sees the full state every time it touches its todos — it cannot mark todo_3 as completed without also seeing that todo_1 and todo_5 are still pending. This mirrors the approach used by OpenCode, where every todo write forces the full list into the response as a cognitive forcing function.

Stronger instructions. The Instructions() prompt now explicitly requires the LLM to call list_todos before its final response and to never leave items in a pending/in-progress state.

Remove auto-clear on all-completed. Previously, marking all items as completed would call storage.Clear(), wiping the list. This destroyed the audit trail and made list_todos return empty, which could confuse the LLM if it tried to verify its work. Completed items now remain visible in storage.

New CreateTodoOutput type. create_todo previously returned a bare Todo. It now returns CreateTodoOutput with created, all_todos, and reminder fields, consistent with the other tool responses.

docker-agent

Review Summary

Assessment: 🟢 APPROVE

This PR successfully addresses the LLM attention problem by adding reminder strings that alert the agent about incomplete todos. The implementation is sound with proper control flow:

✅ Clear separation between completed/pending/in-progress states
✅ Correct conditional logic (reminders only shown when needed)
✅ Storage clearing happens at the right time (after all completed)
✅ Test updated to reflect new output format

The changes effectively solve the problem of LLMs forgetting to mark todos as completed by bringing incomplete items back into their attention window.

Minor optimization opportunities (informational only, not blocking):

Two small inefficiencies noted below could be addressed in a future cleanup

Posted via docker-agent PR review

docker-agent · 2026-03-10T18:18:27Z

pkg/tools/builtin/todo.go

+// incompleteReminder returns a reminder string listing any non-completed todos,
+// or an empty string if all are completed (or storage is empty).
+func (h *todoHandler) incompleteReminder() string {
+	all := h.storage.All()


[LOW] Minor inefficiency: storage.All() called twice

When incompleteReminder() is called, allCompleted() has already called h.storage.All() to check completion status. This results in iterating the storage twice. Consider refactoring to pass the slice or cache the result.

This doesn't affect correctness, just performance (minimal impact in practice given todo lists are typically small).

docker-agent · 2026-03-10T18:18:27Z

pkg/tools/builtin/todo.go

+			if todo.Status == "completed" {
+				completed++
+			}
+			total++


[LOW] Redundant counter: total is always len(all)

The total variable is incremented for every item but will always equal len(all). You can simplify this:

var completed int for _, todo := range all { if todo.Status == "completed" { completed++ } } fmt.Fprintf(&output, "Current todos (%d/%d completed):\n", completed, len(all))

This is a minor code clarity improvement with no functional impact.

LLMs frequently create todos but fail to mark all of them as completed, leaving the todo sidebar in a partially-done state. This happens because the instruction to complete todos is far back in the system prompt by the time the LLM finishes its work. Add an incomplete-todo reminder to update_todos and list_todos tool output so the LLM sees unfinished items directly in its immediate context. Also strengthen the system instructions to emphasize that every todo must be completed before responding.

Add AllTodos field to CreateTodoOutput, CreateTodosOutput, and UpdateTodosOutput so every response includes the complete current state of all todo items. This gives the LLM full visibility into the todo list without needing a separate list_todos call. Also removes the auto-clear-on-all-completed behavior so that completed items remain visible, and adds CreateTodoOutput as a dedicated output type for create_todo (replacing bare Todo).

trungutt marked this pull request as ready for review March 10, 2026 18:16

trungutt requested a review from a team as a code owner March 10, 2026 18:16

docker-agent bot reviewed Mar 10, 2026

View reviewed changes

trungutt force-pushed the trungutt/improve-todo-completion-reliability branch from 01218b3 to 9c1cab4 Compare March 11, 2026 09:57

trungutt changed the title ~~fix: improve todo tool reliability by reminding LLM of incomplete items~~ fix: improve todo tool reliability with full-state responses and stronger instructions Mar 11, 2026

trungutt changed the title ~~fix: improve todo tool reliability with full-state responses and stronger instructions~~ fix: improve todo completion reliability Mar 11, 2026

dgageot approved these changes Mar 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: improve todo completion reliability#2048

fix: improve todo completion reliability#2048
trungutt wants to merge 2 commits intodocker:mainfrom
trungutt:trungutt/improve-todo-completion-reliability

trungutt commented Mar 10, 2026 •

edited

Loading

Uh oh!

docker-agent bot left a comment

Uh oh!

docker-agent bot Mar 10, 2026

Uh oh!

docker-agent bot Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

trungutt commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Changes

Uh oh!

docker-agent bot left a comment

Choose a reason for hiding this comment

Review Summary

Assessment: 🟢 APPROVE

Uh oh!

docker-agent bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

docker-agent bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

trungutt commented Mar 10, 2026 •

edited

Loading