Skill issues? Try agent skills!

Skill issues?

/use-skills

It's all just context.

{
  "model": "claude-opus-4-7",
  "messages": [...]          ← text
  "system": "..."            ← also text
  "tools": [...]             ← also text (JSON schemas)
  "temperature": 1.0         ← sampling parameter
  "top_p": 0.95              ← sampling parameter
  "top_k": 40                ← sampling parameter
  "max_tokens": 4096         ← sampling parameter
  "stop_sequences": [...]    ← sampling parameter
}

quality
  ▲
  │      ╱─────╲
  │    ╱         ╲___
  │  ╱                ╲___
  │╱                       ╲___
  └────────────────────────────▶  context size
        sweet spot       too much

Less context = Better output

— Anthropic, Effective context engineering for AI agents

Context = Money

— GitHub Copilot pricing

2023

"Put it in the prompt."

Lands in: system
What broke: "lost in the middle" — instructions buried in long prompts silently stopped being followed.

2024

"Tools as JSON schemas."

Lands in: tools
What broke: too many tools → model picks the wrong one. Schemas count against context.

late 2024

"CLAUDE.md / .cursorrules / AGENTS.md"

Lands in: system (same slot!)
What broke: files grew unbounded, paid every turn, buried rules stopped firing.

2025

"MCP — connect everything!"

Lands in: tools (same slot!)
What broke: 134K tokens of tool defs before the user says hello.

Oct 2025

"Skills — load on demand."

Lands in: system (descriptions), messages (bodies)
What broke: descriptions are fuzzy triggers — they misfire either way. Bodies still drift.

2026 ?

What is a Skill?

a11y-audit/
└── SKILL.md

───
name: a11y-audit
description: Use when the user asks to
  audit accessibility of a component or
  page. Do NOT use for visual design or
  general code review.
───

# Accessibility audit

## Workflow
1. Render the component or load the URL
2. Run scripts/run-axe.sh for violations
3. Run scripts/keyboard-trace.sh for focus order
4. Cross-check against references/wcag-2.2.md
5. Group issues by WCAG level (A / AA / AAA)

When does a skill enter the context?

Automatically.

The model sees the skill's name + description in the system prompt.
When the user's request matches, the full SKILL.md body gets loaded.

user: audit a11y on the checkout modal
→ matches description of a11y-audit
→ load SKILL.md body

Explicitly.

Invoke the skill by name with a slash command.
Skips the routing — loads immediately.

user: /a11y-audit Modal.tsx
→ load SKILL.md body

There's more.

a11y-audit/
├── SKILL.md
├── references/
│   ├── wcag-2.2.md
│   ├── aria-patterns.md
│   └── common-fixes.md
└── scripts/
    ├── run-axe.sh
    └── keyboard-trace.sh

More context?

always:     name + description    ~100 tokens
on match:   SKILL.md body         < 5K tokens
on demand:  references/*.md       only when SKILL.md says to
on demand:  scripts/*.sh          executed — code never enters context

Each level loads only when the previous one says to.

a11y-audit in action.

# startup
system:  a11y-audit — audit accessibility of a component or page
         ~100 tokens. that's all.

user:    audit a11y on the checkout Modal

claude:  cat a11y-audit/SKILL.md
         → instructions enter context

claude:  ./a11y-audit/scripts/run-axe.sh Modal
         → 4 violations (JSON). script code never enters context.

claude:  cat a11y-audit/references/wcag-2.2.md
         → cross-check violations

# aria-patterns.md, common-fixes.md — not needed here.
# never loaded. zero tokens.

Do skills actually work?

Yes*

Deep

Design tokens live in src/tokens/. Spacing uses a 4px base grid.
Never use raw pixel values — always var(--space-{n}).
The Overlay component traps focus and restores it on close;
new modals MUST use it, not div with z-index.

Shallow

Ensure proper ARIA labels, keyboard navigation, and focus management.
Prefer <button> over <div>. Test with keyboard-only navigation.

Sounds expert. It's still just MDN.

SkillsBench          curated, deep skills        +16.2pp
                     self-generated shallow      −1.3pp

Skills in the Wild   34k random skills           worse than none

Breadth without depth is noise.

Write few. Write deep.
Write what the model can't guess.

Resources

Agent skills specification
Anthropic's skills documentation
skills.sh — Over 90000 skills
SkillsBench (Li et al., 2026) — Benchmarking agent skills
Skills in the Wild (Liu et al., 2026) — Real-world skill retrieval
Context rot
Long context isn't the answer