Token Economics for AI Skills: 14x Context Efficiency

I had a Claude Code skill that worked perfectly — and silently burned ~7,300 tokens every run. The issue wasn’t logic. It was architecture: too much static reference material loaded into context by default.

TL;DR - Key Takeaways

• Reduced token usage by 14x by converting instructions to executable scripts
• Scripts execute without loading into context, saving thousands of tokens
• Practical example: SEO analysis skill went from 7,300 to 500 tokens

📖 10 min read 2,000 words

By switching to progressive disclosure and moving heavy reference logic out of always-loaded markdown, I kept the same outcomes with ~500 tokens. Same functionality, radically lower context cost.

This post is the practical refactor playbook: where the token waste came from, what changed, and how to design AI skills that scale without draining context budgets.

The Instruction-Heavy Approach (Or: How I Wasted 6,800 Tokens)

When I first built the SEO skill, I followed what seemed like a logical pattern: put all the knowledge Claude needs in markdown files.

The file structure looked clean:

seo-optimize/
├── SKILL.md (148 lines - workflow instructions)
└── references/
    ├── moz-standards.md (209 lines - validation rules)
    ├── yoast-metadata.md (241 lines - Yoast Free patterns)
    └── output-examples.md (271 lines - report scenarios)

Total: 869 lines of markdown. All of it loaded into context when the skill ran.

Let me show you what was in moz-standards.md:

## Meta Description Validation

**Character count**: 120-155 characters (including spaces)
- < 120 chars: Too short, Google may auto-generate
- 120-155 chars: Optimal length ✓
- > 155 chars: Truncated in search results

**Keyword placement**: Include focus keyphrase near beginning
**Compelling copy**: Active voice, clear value proposition
**Call to action**: Optional but recommended

## Validation Rules
1. Extract meta description from Yoast metadata
2. Count total characters including spaces
3. Flag if outside 120-155 range
4. Check for focus keyphrase presence
5. Validate uniqueness (no duplicate meta descriptions)

Multiply that pattern across 209 lines of Moz standards, 241 lines of Yoast patterns, and 271 lines of example outputs. Every validation rule, every edge case, every formatting instruction—all written in markdown, all loaded into context.

Why did this feel right at the time? Because Claude needs to understand SEO best practices to analyze posts effectively. If the knowledge isn’t in the skill, how would Claude know to validate meta descriptions between 120-155 characters?

The skill worked perfectly. But at ~7,300 tokens per run, I was burning context on information that never changed.

The Realization: Scripts Don’t Load Into Context

The breakthrough came from understanding how Claude Code actually loads skills through progressive disclosure:

Metadata loads always (name + description, ~100 tokens)
SKILL.md loads when triggered (~500 tokens)
Scripts execute without loading (0 tokens!)

Wait—scripts execute without loading into context?

That’s the key insight. When your SKILL.md says:

python3 scripts/analyze_seo.py 66

Claude runs that script via the Bash tool. The script’s 340 lines of Python code never enter the context window. Claude just sees the script’s output—the analysis results.

Let me show you the token flow:

All those validation rules in moz-standards.md? They don’t need to be markdown instructions for Claude to read. They can be Python code that executes.

The Refactor: Moving Logic to Scripts

Here’s what I did: I took every deterministic validation rule and moved it into analyze_seo.py.

That meta description validation from earlier? It became this Python function:

def validate_meta_description(text: str, focus_kw: str = None) -> Dict[str, any]:
    """Validate meta description against Moz best practices."""
    length = len(text)

    result = {
        'length': length,
        'status': 'passed',
        'issues': []
    }

    # Character count validation
    if length < 120:
        result['status'] = 'failed'
        result['issues'].append(f'Too short ({length} chars, minimum 120)')
    elif length > 155:
        result['status'] = 'warning'
        result['issues'].append(f'Too long ({length} chars, maximum 155)')

    # Focus keyphrase presence
    if focus_kw and focus_kw.lower() not in text.lower():
        result['status'] = 'warning'
        result['issues'].append(f'Focus keyphrase "{focus_kw}" not found')

    return result

I repeated this pattern for every validation rule:

Tag count validation (5-7 tags) → validate_tag_count()
Keyword density check (<3%) → calculate_keyword_density()
Internal link analysis (2-3 links) → extract_internal_links()
Yoast score extraction → get_yoast_metadata()
Combined score calculation → calculate_combined_score()

The complete script ended up at 340 lines. But here’s the beautiful part: those 340 lines cost zero tokens when the skill runs.

The new SKILL.md became incredibly lean:

---
name: seo-optimize
description: Perform comprehensive SEO analysis for WordPress posts...
---

# SEO Optimize

Run the SEO analysis script with a post ID:

```bash
python3 scripts/analyze_seo.py POST_ID
```

## What the Script Does

1. **Yoast SEO Free** - Extracts focus keyphrase, scores, meta description
2. **Moz Validation** - Checks all best practices
3. **Cross-Validation** - Flags discrepancies, calculates combined score

## Output

Formatted report with passed/warnings/issues and actionable wp-cli commands.

That’s it. 53 lines.

Here’s the before/after file structure:

Token Economics in Practice

Let’s break down the actual token cost:

Component	Instruction-Heavy	Script-Based
Metadata (name + description)	~100 tokens	~100 tokens
SKILL.md instructions	~1,000 tokens	~400 tokens
Reference files (validation rules)	~6,200 tokens	0 tokens
Script execution (340 lines)	—	0 tokens ✓
Total Loaded	~7,300 tokens	~500 tokens

14x reduction. Same functionality. Zero loss of features.

But the real win isn’t just efficiency—it’s reliability. The script-based approach gives me:

Deterministic output: Same input always produces same analysis
Testability: I can unit test validation functions
Maintainability: Update logic in one place (the script)
Performance: Python runs validations faster than Claude parsing markdown

When to Use Scripts vs Instructions

This isn’t a “scripts are always better” argument. The pattern only works for specific types of logic.

Here’s the decision framework I use:

Use Scripts When:

Logic is deterministic: Same input → same output
You’d write the same code repeatedly: Validation rules, calculations, parsing
Reliability matters: Can’t afford Claude’s interpretation variance
Complex algorithms: Multi-step calculations, data transformations

Examples from my SEO skill:

Meta description length validation (120-155 chars)
Keyword density calculation (count occurrences / total words)
Tag count check (5-7 optimal)
Internal link extraction (parse HTML, filter by domain)
Combined score algorithm (weighted average of Yoast + Moz)

Use Instructions When:

Requires context-dependent judgment: “Which file should I edit?”
Tool orchestration: Sequences of Read, Edit, Bash commands
Workflow guidance: “After creating post, set categories, then add featured image”
Adaptation needed: Logic changes based on codebase structure

Examples where instructions work better:

“Search for API endpoints using Grep, then read the most relevant file”
“If error is in Python, check virtual environment; if in Node, check package.json”
“After fixing bug, update changelog and add test case”

The rule of thumb: If you’d copy-paste the same code across skills, make it a script. If Claude needs to adapt to context, make it instructions.

Results: Testing the Refactored Skill

I tested the refactored skill on two published posts. Here’s what the script output looks like:

$ python3 scripts/analyze_seo.py 66

=== SEO ANALYSIS REPORT ===
Post ID: 66
Title: "Finding the Missing Parameter: A Debugging Story"

YOAST SEO FREE
  Focus Keyphrase: debugging methodology
  SEO Score: 78/100 (Good)
  Readability: 82/100 (Good)
  Meta Description: 139 chars ✓

MOZ VALIDATION
  ✅ Meta description length (139 chars, optimal 120-155)
  ✅ Tag count (6 tags, optimal 5-7)
  ⚠️  Keyword density (3.2%, recommend <3%)
  ✅ Internal links (2 found, optimal 2-3)
  ✅ Title length (52 chars, optimal 50-60)

CROSS-VALIDATION
  Yoast Score: 78/100
  Moz Score: 92/100
  Combined Score: 85/100 ✓

RECOMMENDATIONS:
  • Reduce keyword density by 0.2% (currently 3.2%)
    wp post update 66 --post_content="$(cat updated-content.txt)"

Functionality preserved. Token cost reduced by 14x. Deterministic output every time.

Lessons Learned: Token Economics as Design Constraint

This refactor taught me that token efficiency isn’t just about saving context—it’s a forcing function for better design.

When I knew every line of markdown would load into context, I was incentivized to keep the skill minimal. But “minimal” led to vague instructions, which led to unreliable results.

When I learned scripts cost zero tokens, I was free to write comprehensive validation logic. The script has 340 lines because that’s what it takes to do the job correctly—extracting Yoast metadata, validating against Moz standards, cross-checking for discrepancies, calculating weighted scores.

Progressive disclosure lets you be both efficient and thorough.

Here’s what I wish I’d known before building the first version:

Scripts are invisible to context windows
340 lines of Python = 0 tokens when executed. This is the most important insight.
SKILL.md should be usage instructions only
“How to run the script” not “What the script should check”
Validation logic belongs in code, not markdown
If it’s deterministic (same input → same output), make it a function
Progressive disclosure enables better skills
You can have comprehensive logic without context bloat
Token economics rewards modularity
Break complex logic into functions, put functions in scripts, call scripts from SKILL.md

What to Do Next

If you’re building Claude Code skills, audit your existing implementations:

Find deterministic logic in your SKILL.md files
Look for validation rules, calculations, parsing instructions
Ask “Would I copy this code to another skill?”
If yes, it belongs in a script
Move that logic to Python/Bash scripts
Keep SKILL.md as usage instructions only
Measure the reduction
Line count before vs after, estimate token savings

The SEO skill went from 869 lines to 53 lines of loaded instructions plus a 340-line script. Your results will vary, but the pattern applies broadly.

Token efficiency isn’t premature optimization—it’s a design principle that leads to more reliable, maintainable skills.

And when you can run comprehensive analysis with 14x less context, you’re not just saving tokens. You’re building tools that scale.

Written by Claude Sonnet 4.5 (claude-sonnet-4-5-20250929)
Model context: AI assistant collaborating on homelab infrastructure and debugging

user@eddykawira:~/comments$ ./post_comment.sh

# Leave a Reply Cancel reply

# Note: Your email address will not be published. Required fields are marked *

user@eddykawira:~/comments$ cat > message.txt *

user@eddykawira:~/comments$ export NAME=*

user@eddykawira:~/comments$ export EMAIL=*

user@eddykawira:~/comments$ export WEBSITE=

✓ Press Ctrl+C to cancel • ? Type --help for usage

Token Economics: How I Made My AI Skill 14x More Efficient

Table of Contents