Skip to content

LCORE-1594: Agent Skills feature breakdown#1484

Open
jrobertboos wants to merge 1 commit intolightspeed-core:mainfrom
jrobertboos:lcore-1594-spike-agent-skills
Open

LCORE-1594: Agent Skills feature breakdown#1484
jrobertboos wants to merge 1 commit intolightspeed-core:mainfrom
jrobertboos:lcore-1594-spike-agent-skills

Conversation

@jrobertboos
Copy link
Copy Markdown
Contributor

@jrobertboos jrobertboos commented Apr 10, 2026

Description

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: (e.g., Claude, CodeRabbit, Ollama, etc., N/A if not used)
  • Generated by: (e.g., tool name and version; N/A if not used)

Related Tickets & Documents

  • Related Issue #
  • Closes #

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • Documentation
    • Added design documentation for the Agent Skills extension framework, including spike analysis and comprehensive design specifications for the planned feature.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 10, 2026

Walkthrough

Two new design documents are added to define the Agent Skills extension model for Lightspeed Core. The documents specify a configuration-driven system with lightspeed-stack.yaml integration, file-based skill discovery, security constraints, and implementation requirements including schema validation, system prompt injection, tool registration, and reference-file access controls.

Changes

Cohort / File(s) Summary
Agent Skills Design Documentation
docs/design/agent-skills/agent-skills.md, docs/design/agent-skills/agent-skills-spike.md
Two design documents defining the Agent Skills feature: a comprehensive specification (agent-skills.md) covering configuration, validation, system prompts, tool workflow, reference constraints, and implementation details; and a spike summary (agent-skills-spike.md) outlining scope decisions, path-based content model, catalog mechanism, security considerations, and JIRA-tracked implementation tasks.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~15 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'LCORE-1594: Agent Skills feature breakdown' is directly related to the changeset, which adds design documents for the Agent Skills feature with architectural breakdown, scope decisions, and implementation proposals.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jrobertboos jrobertboos marked this pull request as ready for review April 15, 2026 15:21
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/design/agent-skills/agent-skills-spike.md`:
- Around line 297-304: The SkillConfiguration model currently only enforces
length for the name field; add a Pydantic field validator on SkillConfiguration
for the name attribute (using field_validator('name') as a `@classmethod`, e.g.,
validate_name_format) that checks the regex ^[a-z0-9-]+$ and raises a ValueError
if it fails, preserving the existing min_length/max_length constraints so names
remain 1-64 chars but now also must be lowercase letters, numbers, and hyphens
only.
- Line 9: Update the "PoC validation" section header and sentence: replace the
confusing line starting with "**PoC validation**: Not applicable for this
spike." followed by the product list with a clear validation statement such as
"PoC validation: Validated by 30+ production implementations including Claude
Code, GitHub Copilot, Cursor, and OpenAI Codex" so the intent (that the standard
is proven) is explicit; edit the sentence under the "PoC validation" header to
use this rewording.
- Around line 133-135: The comment flags fragile line-number references; update
the docs to remove hard-coded line numbers and instead reference durable symbols
like the Configuration class definition in src/models/config.py and the
ModelContextProtocolServer class (MCP server config pattern), or add a note that
any line numbers should be verified/updated if code moves; ensure all
occurrences in agent-skills-spike.md use class names or file+symbol references
rather than specific line numbers.
- Around line 145-148: Clarify in the design/acceptance notes whether
build_skill_catalog() output is appended to per-request system prompts returned
by get_system_prompt() or only to resolved default prompts: update the
implementation plan and acceptance criteria to state the exact behavior (e.g.,
"skills are injected into the final resolved system prompt even when a
per-request system_prompt is provided" or "providing a per-request system_prompt
disables skill injection"), and mention interaction with the
disable_query_system_prompt flag (which when True prevents per-request prompts
and skill injection). Reference get_system_prompt(), build_skill_catalog(), and
disable_query_system_prompt in the note so reviewers and implementers know
exactly where to enforce or check this behavior.

In `@docs/design/agent-skills/agent-skills.md`:
- Around line 164-165: Add a brief comment above the PrivateAttr declarations
for _content and _references explaining these fields are runtime-only caches
(not serialized), are populated once during startup config validation (from
parsing SKILL.md and scanning the references directory), and should not be
relied on for persisted state; reference the _content, _references fields and
the use of PrivateAttr so future maintainers know their purpose and lifecycle.
- Around line 264-276: The pseudo-code in the PR appends the skill catalog to a
non-existent `resolved_prompt` variable but the real get_system_prompt() has
multiple early returns; update get_system_prompt() to first compute and store
the resolved prompt in a single variable (e.g., `base_prompt`) following the
existing precedence (explicit system_prompt, customization.custom_profile
default, customization.system_prompt, constants.DEFAULT_SYSTEM_PROMPT), then
call build_skill_catalog(configuration.skills) and if it returns content append
it to `base_prompt` and return that combined string, otherwise return
`base_prompt`; reference the get_system_prompt function, the build_skill_catalog
call, configuration.skills, and constants.DEFAULT_SYSTEM_PROMPT when making the
change.
- Around line 412-416: The current is_skill_content function uses simple
substring checks which can produce false positives; update is_skill_content to
use a regex that matches the full tag structure instead: compile and use the
pattern r'<skill_content\s+name="[^"]+">.*</skill_content>' with the DOTALL flag
(and import re if not present) and return whether the regex finds a match; keep
the function name is_skill_content and ensure the regex-based check replaces the
existing substring checks.
- Around line 423-439: SkillTracker currently keeps an unbounded _activated dict
(class SkillTracker, methods is_activated, mark_activated, clear) which can leak
memory; update it so entries are removed when conversations end and stale
entries expire: hook clear(conversation_id) into the conversation
deletion/cleanup pipeline, and replace or wrap _activated with a size-limited
LRU or TTL cache (e.g., implement an LRU eviction policy with a max_entries
parameter and/or per-entry expiry timestamps to purge abandoned conversations)
so that mark_activated and is_activated operate against the capped/expiring
storage and expired entries are evicted on access or by a periodic cleanup.
- Around line 520-531: The list_references function currently returns files
under refs_dir without excluding symbolic links, allowing symlinks to point
outside the skill directory; update list_references (the function and its use of
refs_dir and the rglob/f.is_file() check) to skip symlinks by filtering out
entries where f.is_symlink() is True (or use
f.resolve().is_relative_to(skill_path) as an extra safety check) so only regular
files inside the skill_path/references tree are returned; ensure the returned
paths remain relative_to(skill_path).
- Line 485: The line currently references the wrong tool name `read_skill`;
update it to the correct tool name `activate_skill` so the document consistently
uses `activate_skill` (verify and replace any stray `read_skill` mentions with
`activate_skill` in the same section), ensuring references to the tool handler
and function type match the rest of the doc and examples.
- Around line 167-186: In validate_skill_path (the `@model_validator` method
validate_skill_path) add an explicit check that Path(self.path) exists and is a
directory before checking for SKILL.md: compute skill_dir = Path(self.path), if
not skill_dir.exists() or not skill_dir.is_dir() raise a ValueError like "Skill
path not found or not a directory: {skill_dir}" so callers get a clear error;
then continue to build skill_md_path = skill_dir / "SKILL.md" and keep the
existing parse_skill_md, name comparison, setting of self._content and
self._references (list_references(Path(self.path))).
- Around line 327-354: The tool response in handle_activate_skill interpolates
unescaped values into XML; escape XML special characters for skill.name
(attribute), skill._content (body), skill.path, and each item in
skill._references before building lines. Use a proper XML/HTML escape utility
(e.g., html.escape or xml.sax.saxutils.escape) to replace &, <, >, ", ' and
apply it where skill.name, skill._content, skill.path and ref are inserted so
that the attribute, element text, and file entries are safe; keep the same tags
and structure but feed escaped strings into the list assembly.
- Around line 494-517: Update parse_skill_md to use a more flexible frontmatter
regex and stronger field validation: relax the regex to accept CRLF or LF line
endings and optional whitespace after the closing --- (e.g., use \r?\n and allow
whitespace or no trailing newline so frontmatter/body split still works), keep
re.DOTALL, and fall back to a match that allows end-of-file without an extra
newline; then validate that frontmatter (the YAML-safe_load result) is a dict
and that frontmatter["name"] and frontmatter["description"] exist, are instances
of str, and are non-empty after stripping (raise clear ValueError messages if
not). Ensure you update error messages from parse_skill_md to mention the
specific missing/invalid field names and include the variable names frontmatter
and body in the checks to make locating the code easier.
- Around line 252-258: The loop that builds XML for each skill currently
interpolates skill.name and skill.description directly (see the for skill in
skills: block) which can produce malformed XML or injection; update this to
escape XML special characters before interpolation (e.g., call a sanitizer like
html.escape or xml.sax.saxutils.escape on skill.name and skill.description) so
the lines appended use the escaped values; ensure both name and description are
escaped and keep the surrounding tags intact in the block that extends lines.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: adb4ced2-4778-4147-9a10-41eeb61a92eb

📥 Commits

Reviewing files that changed from the base of the PR and between 07ca1b1 and 6208b62.

📒 Files selected for processing (2)
  • docs/design/agent-skills/agent-skills-spike.md
  • docs/design/agent-skills/agent-skills.md
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T16:38:30.287Z
Learning: Applies to AGENTS.md : Document agent implementations and their configurations in AGENTS.md
📚 Learning: 2026-03-02T16:38:30.287Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T16:38:30.287Z
Learning: Applies to AGENTS.md : Document agent implementations and their configurations in AGENTS.md

Applied to files:

  • docs/design/agent-skills/agent-skills.md
🪛 LanguageTool
docs/design/agent-skills/agent-skills.md

[uncategorized] ~26-~26: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...main-specific expertise the LLM can use on demand - Sharing knowledge across deployments ...

(EN_COMPOUND_ADJECTIVE_INTERNAL)


[style] ~47-~47: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...n issues - U2: As a skill author, I want to create a SKILL.md file with instruction...

(REP_WANT_TO_VB)


[style] ~49-~49: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ... - U4: As an enterprise customer, I want to deploy custom skills so that the LLM un...

(REP_WANT_TO_VB)

🔇 Additional comments (4)
docs/design/agent-skills/agent-skills.md (2)

286-317: LGTM: Well-designed tool registration.

The get_skill_tool implementation follows best practices:

  • Only registers when skills are configured (no empty tool)
  • Uses enum constraint to prevent hallucinated skill names
  • Clear description for the model

The use of an enum for the name parameter is particularly good—it prevents the model from requesting non-existent skills.


389-400: LGTM: Secure path validation.

The is_path_in_skill_directory implementation correctly prevents directory traversal attacks by:

  • Using Path.resolve() to normalize and resolve symlinks
  • Using relative_to() to verify the path is within the skill directory
  • Properly handling the ValueError exception

This approach handles edge cases like parent directory traversal (../../../etc/passwd) and symlinks correctly.

docs/design/agent-skills/agent-skills-spike.md (2)

329-336: 🧹 Nitpick | 🔵 Trivial

Consider documenting recommended file permissions.

The security considerations note that configured skill paths are trusted, but don't address the risk of skill directories being writable by the application. If an attacker gains code execution or the application user account is compromised, they could modify SKILL.md files to inject malicious instructions.

Consider adding a recommendation that skill directories should be:

  • Owned by root or a dedicated user (not the application user)
  • Read-only for the application user
  • Not writable by the application process
⛔ Skipped due to learnings
Learnt from: onmete
Repo: lightspeed-core/lightspeed-stack PR: 417
File: src/lightspeed_stack.py:60-63
Timestamp: 2025-08-19T08:57:27.714Z
Learning: In the lightspeed-stack project, file permission hardening (chmod 0o600) for stored configuration JSON files is not required as it's not considered a security concern in their deployment environment.

196-218: Design appropriately addresses path security requirements.

The design document provides a sound validation approach: the is_path_in_skill_directory() function at lines 389–399 uses Path.resolve() and relative_to() to prevent directory traversal attacks. The activate_skill tool returns the skill's base_path, constraining the LLM to construct paths like {base_path}/references/guide.md. Permission allowlisting at the file-read tool level (per agentskills.io guidance) ensures bundled resources are accessible without prompts.

The specific MCP or file-read tool integration can be confirmed during implementation, but the design-level validation and path restriction strategy is correct.


**The recommendation**: Implement the [Agent Skills open standard](https://agentskills.io) with a config-based discovery model. Skills are defined in `lightspeed-stack.yaml` pointing to directories containing `SKILL.md` files. The LLM sees a skill catalog (name + description) in the system prompt and can request full instructions on demand via a dedicated tool.

**PoC validation**: Not applicable for this spike. The feature is well-defined by the agentskills.io specification and has been implemented by 30+ agent products including Claude Code, GitHub Copilot, Cursor, and OpenAI Codex.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Clarify PoC validation statement.

The statement "Not applicable for this spike" followed by listing 30+ products that have implemented the feature is potentially confusing. Consider rewording to something like "PoC validation: Validated by 30+ production implementations including Claude Code, GitHub Copilot, Cursor, and OpenAI Codex" to clarify that the standard itself is proven rather than requiring a separate PoC.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/design/agent-skills/agent-skills-spike.md` at line 9, Update the "PoC
validation" section header and sentence: replace the confusing line starting
with "**PoC validation**: Not applicable for this spike." followed by the
product list with a clear validation statement such as "PoC validation:
Validated by 30+ production implementations including Claude Code, GitHub
Copilot, Cursor, and OpenAI Codex" so the intent (that the standard is proven)
is explicit; edit the sentence under the "PoC validation" header to use this
rewording.

Comment on lines +133 to +135
Key files: src/models/config.py (around line 1817, Configuration class).
Follow the MCP server config pattern (ModelContextProtocolServer class, line 468).
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Line numbers in agentic instructions may become stale.

The agentic tool instruction references specific line numbers (line 1817, line 468) that could become outdated as the codebase evolves. Consider using more durable references like "the Configuration class definition" or "the ModelContextProtocolServer class" without line numbers, or note that these should be updated if the code structure changes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/design/agent-skills/agent-skills-spike.md` around lines 133 - 135, The
comment flags fragile line-number references; update the docs to remove
hard-coded line numbers and instead reference durable symbols like the
Configuration class definition in src/models/config.py and the
ModelContextProtocolServer class (MCP server config pattern), or add a note that
any line numbers should be verified/updated if code moves; ensure all
occurrences in agent-skills-spike.md use class names or file+symbol references
rather than specific line numbers.

Comment on lines +145 to +148
- Create `src/utils/skills.py` module
- Implement `build_skill_catalog()` function that formats skills as XML/structured text
- Modify `get_system_prompt()` in `src/utils/prompts.py` to append skill catalog
- Add behavioral instructions telling the LLM how to use skills
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify how disable_query_system_prompt and per-request prompts interact with configuration

rg -n -A5 -B5 'disable_query_system_prompt' --type=py

Repository: lightspeed-core/lightspeed-stack

Length of output: 19937


🏁 Script executed:

cat -n docs/design/agent-skills/agent-skills-spike.md | sed -n '145,280p'

Repository: lightspeed-core/lightspeed-stack

Length of output: 5939


🏁 Script executed:

grep -n "System prompt injection" docs/design/agent-skills/agent-skills.md -A 30

Repository: lightspeed-core/lightspeed-stack

Length of output: 2514


🏁 Script executed:

sed -n '25,85p' src/utils/prompts.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 2351


🏁 Script executed:

grep -n "per-request\|disable_query\|precedence" docs/design/agent-skills/agent-skills.md

Repository: lightspeed-core/lightspeed-stack

Length of output: 136


🏁 Script executed:

sed -n '199,250p' docs/design/agent-skills/agent-skills.md

Repository: lightspeed-core/lightspeed-stack

Length of output: 1757


Clarify skill injection behavior with per-request system prompts in the implementation plan.

The design specifies appending skill catalog to the resolved system prompt, but doesn't explicitly document whether this injection occurs when users provide per-request system_prompt values. Since get_system_prompt() returns per-request prompts immediately (with highest precedence), the implementation should confirm: Do skills get injected into user-provided prompts, or does providing a custom system prompt disable skill injection? Document this behavior in acceptance criteria or implementation notes to ensure the feature works as intended.

(Note: The disable_query_system_prompt configuration flag already provides control at the precedence level—when set to True, it prevents both per-request customization and any skill injection. This is documented in the code.)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/design/agent-skills/agent-skills-spike.md` around lines 145 - 148,
Clarify in the design/acceptance notes whether build_skill_catalog() output is
appended to per-request system prompts returned by get_system_prompt() or only
to resolved default prompts: update the implementation plan and acceptance
criteria to state the exact behavior (e.g., "skills are injected into the final
resolved system prompt even when a per-request system_prompt is provided" or
"providing a per-request system_prompt disables skill injection"), and mention
interaction with the disable_query_system_prompt flag (which when True prevents
per-request prompts and skill injection). Reference get_system_prompt(),
build_skill_catalog(), and disable_query_system_prompt in the note so reviewers
and implementers know exactly where to enforce or check this behavior.

Comment on lines +297 to +304
**Frontmatter fields**:
| Field | Required | Description |
|-------|----------|-------------|
| `name` | Yes | 1-64 chars, lowercase, hyphens only |
| `description` | Yes | Max 1024 chars, describes what and when |
| `license` | No | License name or file reference |
| `compatibility` | No | Environment requirements |
| `metadata` | No | Arbitrary key-value pairs |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Validation discrepancy: name format not enforced.

The Agent Skills specification (line 300) states that name must be "1-64 chars, lowercase, hyphens only", but the proposed SkillConfiguration model in the design doc (agent-skills.md lines 141-147) only validates length (min_length=1, max_length=64) without enforcing the lowercase/hyphens-only constraint.

This could lead to configuration that violates the agentskills.io standard. Consider adding a Pydantic validator to enforce the format:

`@field_validator`('name')
`@classmethod`
def validate_name_format(cls, v: str) -> str:
    if not re.match(r'^[a-z0-9-]+$', v):
        raise ValueError('Skill name must contain only lowercase letters, numbers, and hyphens')
    return v
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/design/agent-skills/agent-skills-spike.md` around lines 297 - 304, The
SkillConfiguration model currently only enforces length for the name field; add
a Pydantic field validator on SkillConfiguration for the name attribute (using
field_validator('name') as a `@classmethod`, e.g., validate_name_format) that
checks the regex ^[a-z0-9-]+$ and raises a ValueError if it fails, preserving
the existing min_length/max_length constraints so names remain 1-64 chars but
now also must be lowercase letters, numbers, and hyphens only.

Comment on lines +164 to +165
_content: str = PrivateAttr(default="")
_references: list[str] = PrivateAttr(default_factory=list)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Document runtime state nature of private attributes.

The _content and _references private attributes store runtime state populated during config validation. While this approach is correct, consider adding a comment clarifying that these are runtime-only (not serializable) and are populated once at startup during validation.

This helps future maintainers understand that these fields contain cached data from parsing SKILL.md and listing the references directory.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/design/agent-skills/agent-skills.md` around lines 164 - 165, Add a brief
comment above the PrivateAttr declarations for _content and _references
explaining these fields are runtime-only caches (not serialized), are populated
once during startup config validation (from parsing SKILL.md and scanning the
references directory), and should not be relied on for persisted state;
reference the _content, _references fields and the use of PrivateAttr so future
maintainers know their purpose and lifecycle.

Comment on lines +412 to +416
```python
def is_skill_content(message: str) -> bool:
"""Check if a message contains skill content that should be protected."""
return "<skill_content" in message and "</skill_content>" in message
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

String-based tag check is conservative but acceptable.

The is_skill_content function uses simple string matching to identify skill content. This is conservative (may have false positives if users mention these tags) but appropriate for a protection mechanism—better to over-preserve content than accidentally remove skill instructions.

If false positives become an issue, consider using regex to match the full tag structure: r'<skill_content\s+name="[^"]+">.*</skill_content>' with DOTALL flag.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/design/agent-skills/agent-skills.md` around lines 412 - 416, The current
is_skill_content function uses simple substring checks which can produce false
positives; update is_skill_content to use a regex that matches the full tag
structure instead: compile and use the pattern
r'<skill_content\s+name="[^"]+">.*</skill_content>' with the DOTALL flag (and
import re if not present) and return whether the regex finds a match; keep the
function name is_skill_content and ensure the regex-based check replaces the
existing substring checks.

Comment on lines +423 to +439
class SkillTracker:
"""Track activated skills per conversation."""

def __init__(self):
self._activated: dict[str, set[str]] = {} # conversation_id -> skill names

def is_activated(self, conversation_id: str, skill_name: str) -> bool:
return skill_name in self._activated.get(conversation_id, set())

def mark_activated(self, conversation_id: str, skill_name: str) -> None:
if conversation_id not in self._activated:
self._activated[conversation_id] = set()
self._activated[conversation_id].add(skill_name)

def clear(self, conversation_id: str) -> None:
self._activated.pop(conversation_id, None)
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

SkillTracker needs conversation lifecycle integration.

The SkillTracker class maintains an unbounded dictionary mapping conversation IDs to activated skills. This will grow indefinitely unless clear() is called when conversations end.

Consider:

  1. Integrating with conversation cleanup/deletion events
  2. Implementing TTL-based expiry for abandoned conversations
  3. Adding a maximum size with LRU eviction

Without this, long-running instances could accumulate thousands of conversation tracking entries, causing a memory leak.

Would you like me to propose an implementation with LRU caching or TTL-based expiry?

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/design/agent-skills/agent-skills.md` around lines 423 - 439,
SkillTracker currently keeps an unbounded _activated dict (class SkillTracker,
methods is_activated, mark_activated, clear) which can leak memory; update it so
entries are removed when conversations end and stale entries expire: hook
clear(conversation_id) into the conversation deletion/cleanup pipeline, and
replace or wrap _activated with a size-limited LRU or TTL cache (e.g., implement
an LRU eviction policy with a max_entries parameter and/or per-entry expiry
timestamps to purge abandoned conversations) so that mark_activated and
is_activated operate against the capped/expiring storage and expired entries are
evicted on access or by a periodic cleanup.

**Tool registration** (`src/utils/responses.py`):
- `prepare_tools()` at line 204 builds the tool list
- Add skill tool after MCP tools (around line 260)
- Tool handler needs to be registered for the `read_skill` function type
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Incorrect tool name reference.

This line mentions read_skill but the tool is consistently named activate_skill throughout both documents (lines 16, 36, 84, 169, 302). This appears to be a typo or outdated reference.

📝 Proposed fix
-Tool handler needs to be registered for the `read_skill` function type
+Tool handler needs to be registered for the `activate_skill` function type
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- Tool handler needs to be registered for the `read_skill` function type
- Tool handler needs to be registered for the `activate_skill` function type
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/design/agent-skills/agent-skills.md` at line 485, The line currently
references the wrong tool name `read_skill`; update it to the correct tool name
`activate_skill` so the document consistently uses `activate_skill` (verify and
replace any stray `read_skill` mentions with `activate_skill` in the same
section), ensuring references to the tool handler and function type match the
rest of the doc and examples.

Comment on lines +494 to +517
def parse_skill_md(content: str) -> tuple[dict, str]:
"""Parse SKILL.md into frontmatter dict and body string."""
# Match YAML frontmatter between --- delimiters
match = re.match(r"^---\s*\n(.*?)\n---\s*\n(.*)$", content, re.DOTALL)
if not match:
raise ValueError("SKILL.md must have YAML frontmatter between --- delimiters")

frontmatter_text, body = match.groups()

try:
frontmatter = yaml.safe_load(frontmatter_text)
except yaml.YAMLError as e:
raise ValueError(f"Invalid YAML frontmatter: {e}") from e

if not isinstance(frontmatter, dict):
raise ValueError("Frontmatter must be a YAML mapping")

if "name" not in frontmatter:
raise ValueError("SKILL.md frontmatter must include 'name' field")

if "description" not in frontmatter:
raise ValueError("SKILL.md frontmatter must include 'description' field")

return frontmatter, body.strip()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Improve SKILL.md parsing robustness.

The frontmatter regex r"^---\s*\n(.*?)\n---\s*\n(.*)$" could be more flexible to handle:

  • Different line endings (Windows \r\n)
  • Extra whitespace after closing delimiters
  • Files without trailing newlines

Also, the validation checks for presence of name and description but doesn't validate their types (e.g., ensuring they're strings, not empty, etc.).

♻️ More robust parsing
 def parse_skill_md(content: str) -> tuple[dict, str]:
     """Parse SKILL.md into frontmatter dict and body string."""
     # Match YAML frontmatter between --- delimiters
-    match = re.match(r"^---\s*\n(.*?)\n---\s*\n(.*)$", content, re.DOTALL)
+    # Allow for different line endings and flexible whitespace
+    match = re.match(r"^---\s*[\r\n]+(.*?)[\r\n]+---\s*[\r\n]+(.*)$", content, re.DOTALL)
     if not match:
         raise ValueError("SKILL.md must have YAML frontmatter between --- delimiters")
 
     frontmatter_text, body = match.groups()
 
     try:
         frontmatter = yaml.safe_load(frontmatter_text)
     except yaml.YAMLError as e:
         raise ValueError(f"Invalid YAML frontmatter: {e}") from e
 
     if not isinstance(frontmatter, dict):
         raise ValueError("Frontmatter must be a YAML mapping")
 
-    if "name" not in frontmatter:
+    if "name" not in frontmatter or not isinstance(frontmatter["name"], str) or not frontmatter["name"].strip():
-        raise ValueError("SKILL.md frontmatter must include 'name' field")
+        raise ValueError("SKILL.md frontmatter must include non-empty 'name' field")
 
-    if "description" not in frontmatter:
+    if "description" not in frontmatter or not isinstance(frontmatter["description"], str) or not frontmatter["description"].strip():
-        raise ValueError("SKILL.md frontmatter must include 'description' field")
+        raise ValueError("SKILL.md frontmatter must include non-empty 'description' field")
 
     return frontmatter, body.strip()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/design/agent-skills/agent-skills.md` around lines 494 - 517, Update
parse_skill_md to use a more flexible frontmatter regex and stronger field
validation: relax the regex to accept CRLF or LF line endings and optional
whitespace after the closing --- (e.g., use \r?\n and allow whitespace or no
trailing newline so frontmatter/body split still works), keep re.DOTALL, and
fall back to a match that allows end-of-file without an extra newline; then
validate that frontmatter (the YAML-safe_load result) is a dict and that
frontmatter["name"] and frontmatter["description"] exist, are instances of str,
and are non-empty after stripping (raise clear ValueError messages if not).
Ensure you update error messages from parse_skill_md to mention the specific
missing/invalid field names and include the variable names frontmatter and body
in the checks to make locating the code easier.

Comment on lines +520 to +531
def list_references(skill_path: Path) -> list[str]:
"""List files in the skill's references/ subdirectory."""
refs_dir = skill_path / "references"
if not refs_dir.is_dir():
return []

return [
str(f.relative_to(skill_path))
for f in refs_dir.rglob("*")
if f.is_file()
]
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider symlink handling in reference listing.

The list_references function lists all files in the references/ directory but doesn't exclude symlinks. A symbolic link could point outside the skill directory, potentially exposing unintended files.

🔒 Filter out symlinks for security
 def list_references(skill_path: Path) -> list[str]:
     """List files in the skill's references/ subdirectory."""
     refs_dir = skill_path / "references"
     if not refs_dir.is_dir():
         return []
 
     return [
         str(f.relative_to(skill_path))
         for f in refs_dir.rglob("*")
-        if f.is_file()
+        if f.is_file() and not f.is_symlink()
     ]

This prevents accidentally listing files outside the skill directory via symlinks. If symlink support is desired for legitimate use cases, add explicit documentation about the security implications.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/design/agent-skills/agent-skills.md` around lines 520 - 531, The
list_references function currently returns files under refs_dir without
excluding symbolic links, allowing symlinks to point outside the skill
directory; update list_references (the function and its use of refs_dir and the
rglob/f.is_file() check) to skip symlinks by filtering out entries where
f.is_symlink() is True (or use f.resolve().is_relative_to(skill_path) as an
extra safety check) so only regular files inside the skill_path/references tree
are returned; ensure the returned paths remain relative_to(skill_path).


**The problem**: Lightspeed Core has no mechanism for extending agent capabilities with specialized instructions or domain knowledge. Users cannot package reusable workflows, troubleshooting guides, or domain expertise into portable, discoverable units that the LLM can use on demand.

**The recommendation**: Implement the [Agent Skills open standard](https://agentskills.io) with a config-based discovery model. Skills are defined in `lightspeed-stack.yaml` pointing to directories containing `SKILL.md` files. The LLM sees a skill catalog (name + description) in the system prompt and can request full instructions on demand via a dedicated tool.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've noticed this "pointing to directories" pattern throughout this project, but want to potentially explore a new way via this feature if possible -

If we look to the Kubernetes world to guide us in this for example, helm charts can be found in https://artifacthub.io, and some helm charts are distributed using container registries, some distributed using GitHub, and some using standalone websites. Here's a popular helm chart, https://artifacthub.io/packages/helm/percona/ps-db, distributed using GitHub - https://percona.github.io/percona-helm-charts/

Notice that the way for me to use the ps-db helm chart artifacts (which are files in a folder), is not "clone helm GitHub repository, use this directory" - that wouldn't be scalable once it got to 100+ helm charts.

Which means it won't be scalable for us when we get 10+ agent skill contribution.

Here's examples of the three scenarios if we were to use them for distributing agent skills:

  1. using container registries - https://docker.io/anik120/markdown-to-gdoc-converter-agent-skill
  2. using GitHub - https://anik120.github.io/markdown-to-gdoc-converter-agent-skill/
  3. Using standalone website - https://my-personal-website/markdown-to-gdoc-converter-agent-skill

I think for our use case, the best first step is to start with a GitHub distribution mechanism. That would mean we'd need to build in mechanism to use the GitHub apis to pull down these artifacts (and make sure they contain artifacts in the filesystem that conform to the standards put forth in agentskills.io).

This will help Red Hat productize agent skills too - we can have Red hat owned repositories that contain these skills, that are submitted by partner eco-system partners (and others), but vetted by Red Hat.

...the same pattern can then be developed for `providers" and other stuff...but we can start with agent skills as the implementation guidance for other artifacts.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ie to use the example you have below, we'd modify it to look like:

skills:
  - name: "code-review"
    description: "Review code for best practices and security issues."
    path: https://redhat.github.io/code-review-agent/
  - name: "troubleshooting"
    description: "Diagnose and fix OpenShift deployment issues."
    path: https://redhat.github.io/openshift-troubleshooter-agent/

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pulling skills down from external urls introduces significant security/trust concerns. Not only "do i trust this url today" but "what happens if the content at that url is compromised or updated in the future?"

For that reason alone I think starting w/ reading from the filesystem is correct.

But in addition, there are many ways, within k8s, to inject content into a container filesystem. So there's no reason that in the future, you can't add adapters that pull container from github, or copy content over from a sidecar container or uses image volumes to pull content from a "skills library" image, or even define skills in configmaps that get mounted in.

So my recommendation would be to start by assuming LCORE gets its skills from the filesystem. Then later you can worry about the best ways to get those skills onto the filesystem from various marketplace/distribution sources.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea however I believe that this is something that could be added in a later feature as there would not be much architectural change in the system wether it be a local "directory"/path or a remote "directory"/path. I do think that a local directory is required while the remote aspect is a "would be nice" tho.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add somewhere tho to make the path handled by some sort of interface so we can have local and remote handlers for the skill directories.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I think that makes sense. with

So there's no reason that in the future, you can't add adapters that pull container from github, or copy content over from a sidecar container or uses image volumes to pull content from a "skills library" image, or even define skills in configmaps that get mounted in.

we don't have to do

I can add somewhere tho to make the path handled by some sort of interface so we can have local and remote handlers for the skill directories.

this even. It can just be a plain simple path in the filesystem that is pointed to at the configuration - the abstraction for how the content is pulled then is taken care at the sidecar level

| B | Scripts allowed (full spec compliance) |
| C | Deferred (start with no scripts, add later) |

**Recommendation**: **A** (No scripts). As noted in LCORE-1339, there are security concerns with executing arbitrary scripts. The core value of skills is in the instructions — scripts can be added in a future phase after security review if needed.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we adopt the agent skill distribution I suggested above, we can have script execution capabilities - Red Hat verified agent skills (with full, script execution capabilities) in a Red Hat owned repo would be the business value we bring - if you're an enterprise customer using the agent skills from the Red Hat owned repos, you can be sure your system won't be compromised with script execution.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

until LCORE provides a sandbox mechanism to run said scripts in an ephemeral, isolated environment, I wholeheartedly agree w/ not supporting script execution.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of supporting more than just local paths for skills however it would be just as easy to execute scripts from a local directory the only reason I am recommending not implementing that functionality is according to the feature ticket it is out of scope.


**The problem**: Lightspeed Core has no mechanism for extending agent capabilities with specialized instructions or domain knowledge. Users cannot package reusable workflows, troubleshooting guides, or domain expertise into portable, discoverable units that the LLM can use on demand.

**The recommendation**: Implement the [Agent Skills open standard](https://agentskills.io) with a config-based discovery model. Skills are defined in `lightspeed-stack.yaml` pointing to directories containing `SKILL.md` files. The LLM sees a skill catalog (name + description) in the system prompt and can request full instructions on demand via a dedicated tool.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pulling skills down from external urls introduces significant security/trust concerns. Not only "do i trust this url today" but "what happens if the content at that url is compromised or updated in the future?"

For that reason alone I think starting w/ reading from the filesystem is correct.

But in addition, there are many ways, within k8s, to inject content into a container filesystem. So there's no reason that in the future, you can't add adapters that pull container from github, or copy content over from a sidecar container or uses image volumes to pull content from a "skills library" image, or even define skills in configmaps that get mounted in.

So my recommendation would be to start by assuming LCORE gets its skills from the filesystem. Then later you can worry about the best ways to get those skills onto the filesystem from various marketplace/distribution sources.

| B | Customer-defined only (end users import their own skill definitions) |
| C | Both built-in and customer-defined |

**Recommendation**: **B** (Customer-defined only). This allows end users to extend Lightspeed with their own domain expertise without requiring changes to the core product. Built-in skills can be added later if needed.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to propose that the only way stack products (e.g. rhel lightspeed) would be able to ship skills would be if they were a part of(shipped with) lightspeed core? Is that the intended statement?

It seems like the primary value of this feature would be to allow product teams to inject better agentic guidance into LCORE that serves their specific intended use cases. And LCORE should enable that in a way that does not require those skills be packaged with LCORE itself. (Such as by allowing skills to be injected into the LCORE filesystem via configmaps or container volumes provided by product teams)

customers adding their own skills also makes sense, but opens product teams up to the risk that a customer adds a skill that conflicts w/ their own logic, for better or worse. So i'd expect a lot of careful thought to go into what it means to allow customers to do that. It's like allowing them to edit the system prompt, effectively.

Copy link
Copy Markdown
Contributor Author

@jrobertboos jrobertboos Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so by customer I mean an LS app team (e.g. rhel lightspeed) and they should be able to have/maintain their own skills which then would host the skills in the container alongside lightspeed stack and would point using the lightspeed-stack configuration to the included skills.

The customers (non-red hat) of the LS app teams would not have the ability to add skills just like they do not have the ability to add mcp servers.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok in that case i think we are in agreement but i'd suggest a terminology cleanup because "customers" and "end users" is easy to misunderstand.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read "Customers" as Openshift/RHEL/Ansible users too, might be worth clarifying this as "product team" (or something that identifies Openshift/RHEL/Ansible teams clearly)

| C | API-based (skills registered/managed via REST API) |
| D | Hybrid (built-in via config, customer-defined via filesystem or API) |

**Recommendation**: **B** (Config-based). Skills are defined in `lightspeed-stack.yaml` similar to `mcp_servers`. This provides explicit control over which skills are available, integrates with existing configuration patterns, and avoids filesystem scanning complexity.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the skill definitions would be inlined in the lightspeed-stack.yaml? I think that is a mistake. For one thing, it means the config CR that drives LCORE needs to contain those skills, which potentially makes the CR huge (unless you're going to stitch the lightspeed-stack.yaml together from a config CR plus various Skill CRs).

It also makes the yaml file huge and hard to update when one particular skill needs to be modified.

MCP servers are not an equivalent config concept here because they just have a few well specified config fields that need to be represented.

Loading skills from the filesystem should not be complicated. Getting the skill files onto the filesystem could be complicated, but again k8s offers numerous ways to accomplish this (and for non-k8s deployments, there are also many ways to solve this problem, none of them particularly novel or complicated)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you that skill definitions are not necessarily needed to be defined in the config (I will update that tomorrow) as they can be fetched from the actual files. The only reason I did it this way was because I was thinking of the config being like the skills catalog.

I am not sure though why the skills should not be similar to the MCP config tho at least in the sense that at the bare minimum we need a path.

And it could be that we supply just one path or a list of paths and its expected that that path leads to a directory of skills e.g.

~/.agents/skills/
├── pdf-processing/
│   ├── SKILL.md          ← discovered
│   └── scripts/
│       └── extract.py
├── data-analysis/
│   └── SKILL.md          ← discovered
└── README.md             ← ignored (not a skill directory)

p.s. sorry for the broken English, I am tired haha

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure though why the skills should not be similar to the MCP config tho at least in the sense that at the bare minimum we need a path.

no disagreement there, my "this is not the same as mcp config" was only intended to mean "we don't need the full config definition inlined". Having the skill config consist entirely of either a path per skill, or just a path to a directory of all the skills, is what i had in mind.

| B | Scripts allowed (full spec compliance) |
| C | Deferred (start with no scripts, add later) |

**Recommendation**: **A** (No scripts). As noted in LCORE-1339, there are security concerns with executing arbitrary scripts. The core value of skills is in the instructions — scripts can be added in a future phase after security review if needed.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

until LCORE provides a sandbox mechanism to run said scripts in an ephemeral, isolated environment, I wholeheartedly agree w/ not supporting script execution.

| B | Inline (instructions embedded directly in YAML) |
| C | Both (support either path or inline content) |

**Recommendation**: **A** (Path-based). This follows the agentskills.io spec directory structure, keeps the YAML config clean, and allows skills to include `references/` subdirectories for additional documentation that can be loaded on demand.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per my other comments, i'm +1 on path-based.

| B | Dedicated tool (register `activate_skill` tool, LLM calls to load) |
| C | Per-request parameter (client specifies which skills to activate) |

**Recommendation**: **A** (System prompt injection) for the catalog, combined with **B** (dedicated tool) for loading full instructions. The system prompt contains the skill catalog (name + description, ~50-100 tokens per skill). When the LLM decides a skill is relevant, it calls the `activate_skill` tool to load the full instructions.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ok, but i'm going to offer up an alternative approach: Don't even include the skill descriptions in the system prompt. Just give the llm a "skill_search" tool. Let it invoke the skill_search tool to try to identify skills that may match the intent, and then it can pick from the search results to invoke activate_skill. (The skill searching would be some sort of keyword+semantic hybrid search)

This might be a "phase 2" implementation. The risk is that searching is ineffective in finding the right skills, but the upside is you don't risk overwhelming the model with too many skill descriptions in the system prompt.

Maybe consider a hybrid approach (as long as their are less than X skills, just include them all in the system prompt. When we get beyond X skills, move to a skill search approach.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the main reason that I did it this way by adding the skills to the context is because of this. I guess this could work well especially once a lot of skills get added but ur right in that I think that this would be a "phase 2" thing, I might change PR tho so that instead of appending to context a required skill_search tool that "for now" does not search but just returns the skill catalog list (names + descriptions). that would make extending the "search" part of the tool so that the model is not overwhelmed by large number of skills easy in the future.

- `description`: What the skill does and when to use it
- `path`: Absolute path to directory containing `SKILL.md`

**Recommendation**: Approved. This structure is consistent with existing config patterns and provides explicit control.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't skill.md files self-describing? What is the benefit to explicitly configure individual skills vs just pointing to a directory of skills, the same way claude/cursor/etc do? (skills directory, containing a directory per skill)

I can envision a few benefits to this approach, but it also adds significant config management overheard and opportunity for mistakes (what happens when the path is incorrect? what happens when the name/description don't match the skill.md file the path points to?)


If lightspeed-stack implements context compaction (conversation history summarization), skill content must be exempted from pruning. Skill instructions are durable behavioral guidance — losing them mid-conversation silently degrades performance.

The `<skill_content>` tags from structured wrapping enable identification during compaction:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems potentially fragile. what happens if this string shows up in the context for other reasons?

Also at some point you may be unable to compact if you're unwilling to compact skill content.

Copy link
Copy Markdown
Contributor Author

@jrobertboos jrobertboos Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the main reason I did this is its the recommendation from the agent skills spec. also I did not think about the compaction problem but I think that would only really be a problem if there were ton and TONS of skills, which I think we could cover in a "phase 2".

"Phase 2" could basically be increasing scalability to allow for integration of more than 100 (or any limiting number) skills. As this would cover working on the skill_search which would be needed for large numbers of skills and how to handle compaction with a very large skill catalog (I think that this is less of an issue because I am not finding anyone having problems with this online, but still it would be something to safeguard against).

also the reason we avoid compaction is because its defined in the spec here.

self._activated: dict[str, set[str]] = {} # conversation_id -> skill names

def is_activated(self, conversation_id: str, skill_name: str) -> bool:
return skill_name in self._activated.get(conversation_id, set())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this persisted somewhere? if not, what are the implications for persisting+resuming conversations?

|----------|-------|
| Skill path doesn't exist | Startup fails: "Skill path does not exist: {path}" |
| SKILL.md not found | Startup fails: "SKILL.md not found at {path}/SKILL.md" |
| Name mismatch | Startup fails: "Skill name mismatch: config has '{x}' but SKILL.md has '{y}'" |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per my earlier comment, you could avoid this entire class of error by simplifying the config.yaml to not include this information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants