Skip to content

feat(mcp): resolve local-element declarations on top-level miss#8

Merged
caio-pizzol merged 2 commits into
mainfrom
caio/ooxml-mcp-local-element-resolution
May 12, 2026
Merged

feat(mcp): resolve local-element declarations on top-level miss#8
caio-pizzol merged 2 commits into
mainfrom
caio/ooxml-mcp-local-element-resolution

Conversation

@caio-pizzol
Copy link
Copy Markdown
Contributor

A real agent asking ooxml_attributes w:cs got Not found — but w:cs shows up in real .docx files. Reason: cs, rtl, lang, dir, bdo are declared inline inside EG_RPrBase / EG_ContentRunContent in the WML XSD, not as top-level xsd:elements, so the global-only lookup misses them. The agent had to pivot to prose search and reconstruct a structural answer the database actually has.

ooxml_attributes, ooxml_children, and ooxml_element now fall back through local-element declarations when the top-level lookup misses. Conservative scoping: top-level still wins, fallback only on miss, never guesses.

  • Helper findLocalElementsInNamespace(localName, namespace, profile) returns local-element rows with parent kind/name and type_ref.
  • If exactly one local declaration exists, or all matches share the same type_ref, follow that type and return the attribute/children report with a "resolved via local element in <kind> <name>, type X" header.
  • If multiple declarations have different types (the tblGrid-style case), return a disambiguation list. No guess.
  • For ooxml_element specifically, return a ## Local element: X report rather than pretending it's a global Element.

Same-namespace local resolution suppresses the cross-vocab did-you-mean that PR 1 introduced. The trace that motivated this PR showed w:cs surfacing r:cs (a relationship attribute) as a suggestion — technically true but unhelpful.

No DB migration. parent_symbol_id and type_ref were already populated.

Acceptance criteria (all covered by tests, green locally against the real Transitional bundle):

  • ooxml_attributes w:cs / w:rtlCT_OnOff.val
  • ooxml_attributes w:langCT_Language.{val, eastAsia, bidi}
  • ooxml_attributes w:dirCT_DirContentRun.val
  • ooxml_attributes w:bdoCT_BdoContentRun.val
  • Cross-vocab did-you-mean suppressed when same-namespace local resolves
  • Different-type ambiguity → list, not guess

Verified: 88 pass / 0 fail / 0 skip. Format / lint / typecheck / build all clean.

A real agent calling ooxml_attributes for w:cs, w:rtl, w:lang, w:dir,
w:bdo - all elements that show up in real .docx files - got Not found.
Reason: those elements are declared inline inside EG_RPrBase /
EG_ContentRunContent groups in the WML XSD, not as top-level
xsd:elements, so the global-only lookup misses them. The agent had to
fall back to prose search and reconstruct an answer the schema graph
actually has.

ooxml_attributes / ooxml_children / ooxml_element now fall back through
local-element declarations when the top-level lookup misses:

- Search xsd_symbols for local elements with the same local name in
  the same namespace (parent_symbol_id IS NOT NULL).
- If exactly one local declaration exists, or all matching declarations
  share the same type_ref, follow that type and return the report. The
  header surfaces "resolved via local element in <kind> <name>, type X"
  so the agent understands the indirection.
- If multiple declarations have different type_refs (the tblGrid case),
  return a disambiguation list - never a guess.
- For ooxml_element specifically, return a local-element report
  ("## Local element: X") rather than pretending it's a global Element.

Same-namespace local resolution suppresses the cross-vocab did-you-mean
that PR 1 introduced. The trace that motivated this work showed w:cs
surfacing r:cs (a relationship attribute) as a suggestion - technically
true but unhelpful. Local resolution wins; cross-vocab stays as the
last-resort fallback only when no same-namespace local exists.

No DB migration: parent_symbol_id and type_ref were already populated
during ingest. This is a query/dispatch enhancement.

Tests cover both layers:

- Helper level (fixture XSDs): findLocalElementsInNamespace returns
  scoped results, excludes top-level, reports parent kind and name,
  preserves per-parent type_refs for the ambiguous case.
- Dispatch level: a new fixture (EG_LocalCase containing local_para
  typed CT_Para) exercises the single-match resolution end to end,
  asserting "resolved via local element" headers and the expected
  attributes. The ambiguous-shared fixture confirms disambiguation
  without guessing.
- Real-cache acceptance (gated on the full Transitional bundle):
  w:cs / w:rtl resolve to CT_OnOff.val; w:lang to CT_Language with
  val, eastAsia, bidi; w:dir / w:bdo to their respective types' val
  attribute; w:cs no longer leads with the cross-vocab r:cs hint.
- ooxml_element on an ambiguous local name (e.g. fixture w:shared, declared
  in CT_OuterA as ST_Jc AND CT_OuterB as xsd:string) previously called
  formatLocalElementReport which promoted locals[0] as the canonical type
  and listed the rest under "also declared". That implied a primary
  answer where none exists. ooxml_element now goes through resolveLocalElement
  like the other dispatchers: same single/ambiguous policy, ambiguous
  cases use formatLocalElementAmbiguous (no primary), formatLocalElementReport
  is reached only when callers proved the locals share one type_ref.
- resolveLocalElement filtered null type_refs out before checking for a
  single value. A mix like [null, CT_X] would collapse to {CT_X} and
  resolve incorrectly — the inline-typed declaration has its own content
  model the type symbol can't represent. Now requires every local hit
  to share the same non-null type_ref before resolving as single;
  anything else (mixed nulls, multiple non-null values) goes to
  ambiguous.

formatLocalElementReport signature changes to take the resolved (first,
locals) explicitly, with a comment documenting the "all share one
type_ref" invariant. "Also declared in N other contexts" now drops the
per-context type_ref (it's the same as the primary) and says "with the
same type" to make the invariant visible to readers.

Tests cover the ooxml_element regression: w:shared (the existing
fixture for type-disagreement) now exercises the ambiguous path,
verifying the single-resolution heading does not appear and no
"Also declared in" footer leaks the first hit as canonical.
@caio-pizzol caio-pizzol merged commit 48e8f0f into main May 12, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants