Token Economics: How I Made My AI Skill 14x More Efficient
Table of Contents
I built an SEO analysis skill for Claude Code. It worked perfectly—extracting Yoast metadata, validating against Moz best practices, calculating combined scores. There was just one problem: it was loading 7,300 tokens into context every single time it ran.
TL;DR - Key Takeaways
- • Reduced token usage by 14x by converting instructions to executable scripts
- • Scripts execute without loading into context, saving thousands of tokens
- • Practical example: SEO analysis skill went from 7,300 to 500 tokens
Then I learned about token economics and progressive disclosure. Fifteen minutes later, I had the same functionality running with 500 tokens. A 14x reduction.
This is the story of that refactor—and why understanding how AI collaboration tools work and how Claude Code loads skills might be the most important thing you learn about building custom AI tools.
The Instruction-Heavy Approach (Or: How I Wasted 6,800 Tokens)
When I first built the SEO skill, I followed what seemed like a logical pattern: put all the knowledge Claude needs in markdown files.
The file structure looked clean:
seo-optimize/
├── SKILL.md (148 lines - workflow instructions)
└── references/
├── moz-standards.md (209 lines - validation rules)
├── yoast-metadata.md (241 lines - Yoast Free patterns)
└── output-examples.md (271 lines - report scenarios)
Total: 869 lines of markdown. All of it loaded into context when the skill ran.
Let me show you what was in moz-standards.md:
## Meta Description Validation
**Character count**: 120-155 characters (including spaces)
- < 120 chars: Too short, Google may auto-generate
- 120-155 chars: Optimal length ✓
- > 155 chars: Truncated in search results
**Keyword placement**: Include focus keyphrase near beginning
**Compelling copy**: Active voice, clear value proposition
**Call to action**: Optional but recommended
## Validation Rules
1. Extract meta description from Yoast metadata
2. Count total characters including spaces
3. Flag if outside 120-155 range
4. Check for focus keyphrase presence
5. Validate uniqueness (no duplicate meta descriptions)
Multiply that pattern across 209 lines of Moz standards, 241 lines of Yoast patterns, and 271 lines of example outputs. Every validation rule, every edge case, every formatting instruction—all written in markdown, all loaded into context.
Why did this feel right at the time? Because Claude needs to understand SEO best practices to analyze posts effectively. If the knowledge isn’t in the skill, how would Claude know to validate meta descriptions between 120-155 characters?
The skill worked perfectly. But at ~7,300 tokens per run, I was burning context on information that never changed.
The Realization: Scripts Don’t Load Into Context
The breakthrough came from understanding how Claude Code actually loads skills through progressive disclosure:
- Metadata loads always (name + description, ~100 tokens)
- SKILL.md loads when triggered (~500 tokens)
- Scripts execute without loading (0 tokens!)
Wait—scripts execute without loading into context?
That’s the key insight. When your SKILL.md says:
python3 scripts/analyze_seo.py 66
Claude runs that script via the Bash tool. The script’s 340 lines of Python code never enter the context window. Claude just sees the script’s output—the analysis results.
Let me show you the token flow:
All those validation rules in moz-standards.md? They don’t need to be markdown instructions for Claude to read. They can be Python code that executes.
The Refactor: Moving Logic to Scripts
Here’s what I did: I took every deterministic validation rule and moved it into analyze_seo.py.
That meta description validation from earlier? It became this Python function:
def validate_meta_description(text: str, focus_kw: str = None) -> Dict[str, any]:
"""Validate meta description against Moz best practices."""
length = len(text)
result = {
'length': length,
'status': 'passed',
'issues': []
}
# Character count validation
if length < 120:
result['status'] = 'failed'
result['issues'].append(f'Too short ({length} chars, minimum 120)')
elif length > 155:
result['status'] = 'warning'
result['issues'].append(f'Too long ({length} chars, maximum 155)')
# Focus keyphrase presence
if focus_kw and focus_kw.lower() not in text.lower():
result['status'] = 'warning'
result['issues'].append(f'Focus keyphrase "{focus_kw}" not found')
return result
I repeated this pattern for every validation rule:
- Tag count validation (5-7 tags) →
validate_tag_count() - Keyword density check (<3%) →
calculate_keyword_density() - Internal link analysis (2-3 links) →
extract_internal_links() - Yoast score extraction →
get_yoast_metadata() - Combined score calculation →
calculate_combined_score()
The complete script ended up at 340 lines. But here’s the beautiful part: those 340 lines cost zero tokens when the skill runs.
The new SKILL.md became incredibly lean:
---
name: seo-optimize
description: Perform comprehensive SEO analysis for WordPress posts...
---
# SEO Optimize
Run the SEO analysis script with a post ID:
```bash
python3 scripts/analyze_seo.py POST_ID
```
## What the Script Does
1. **Yoast SEO Free** - Extracts focus keyphrase, scores, meta description
2. **Moz Validation** - Checks all best practices
3. **Cross-Validation** - Flags discrepancies, calculates combined score
## Output
Formatted report with passed/warnings/issues and actionable wp-cli commands.
That’s it. 53 lines.
Here’s the before/after file structure:
Token Economics in Practice
Let’s break down the actual token cost:
| Component | Instruction-Heavy | Script-Based |
|---|---|---|
| Metadata (name + description) | ~100 tokens | ~100 tokens |
| SKILL.md instructions | ~1,000 tokens | ~400 tokens |
| Reference files (validation rules) | ~6,200 tokens | 0 tokens |
| Script execution (340 lines) | — | 0 tokens ✓ |
| Total Loaded | ~7,300 tokens | ~500 tokens |
14x reduction. Same functionality. Zero loss of features.
But the real win isn’t just efficiency—it’s reliability. The script-based approach gives me:
- Deterministic output: Same input always produces same analysis
- Testability: I can unit test validation functions
- Maintainability: Update logic in one place (the script)
- Performance: Python runs validations faster than Claude parsing markdown
When to Use Scripts vs Instructions
This isn’t a “scripts are always better” argument. The pattern only works for specific types of logic.
Here’s the decision framework I use:
Use Scripts When:
- Logic is deterministic: Same input → same output
- You’d write the same code repeatedly: Validation rules, calculations, parsing
- Reliability matters: Can’t afford Claude’s interpretation variance
- Complex algorithms: Multi-step calculations, data transformations
Examples from my SEO skill:
- Meta description length validation (120-155 chars)
- Keyword density calculation (count occurrences / total words)
- Tag count check (5-7 optimal)
- Internal link extraction (parse HTML, filter by domain)
- Combined score algorithm (weighted average of Yoast + Moz)
Use Instructions When:
- Requires context-dependent judgment: “Which file should I edit?”
- Tool orchestration: Sequences of Read, Edit, Bash commands
- Workflow guidance: “After creating post, set categories, then add featured image”
- Adaptation needed: Logic changes based on codebase structure
Examples where instructions work better:
- “Search for API endpoints using Grep, then read the most relevant file”
- “If error is in Python, check virtual environment; if in Node, check package.json”
- “After fixing bug, update changelog and add test case”
The rule of thumb: If you’d copy-paste the same code across skills, make it a script. If Claude needs to adapt to context, make it instructions.
Results: Testing the Refactored Skill
I tested the refactored skill on two published posts. Here’s what the script output looks like:
$ python3 scripts/analyze_seo.py 66
=== SEO ANALYSIS REPORT ===
Post ID: 66
Title: "Finding the Missing Parameter: A Debugging Story"
YOAST SEO FREE
Focus Keyphrase: debugging methodology
SEO Score: 78/100 (Good)
Readability: 82/100 (Good)
Meta Description: 139 chars ✓
MOZ VALIDATION
✅ Meta description length (139 chars, optimal 120-155)
✅ Tag count (6 tags, optimal 5-7)
⚠️ Keyword density (3.2%, recommend <3%)
✅ Internal links (2 found, optimal 2-3)
✅ Title length (52 chars, optimal 50-60)
CROSS-VALIDATION
Yoast Score: 78/100
Moz Score: 92/100
Combined Score: 85/100 ✓
RECOMMENDATIONS:
• Reduce keyword density by 0.2% (currently 3.2%)
wp post update 66 --post_content="$(cat updated-content.txt)"
Functionality preserved. Token cost reduced by 14x. Deterministic output every time.
Lessons Learned: Token Economics as Design Constraint
This refactor taught me that token efficiency isn’t just about saving context—it’s a forcing function for better design.
When I knew every line of markdown would load into context, I was incentivized to keep the skill minimal. But “minimal” led to vague instructions, which led to unreliable results.
When I learned scripts cost zero tokens, I was free to write comprehensive validation logic. The script has 340 lines because that’s what it takes to do the job correctly—extracting Yoast metadata, validating against Moz standards, cross-checking for discrepancies, calculating weighted scores.
Progressive disclosure lets you be both efficient and thorough.
Here’s what I wish I’d known before building the first version:
- Scripts are invisible to context windows
340 lines of Python = 0 tokens when executed. This is the most important insight. - SKILL.md should be usage instructions only
“How to run the script” not “What the script should check” - Validation logic belongs in code, not markdown
If it’s deterministic (same input → same output), make it a function - Progressive disclosure enables better skills
You can have comprehensive logic without context bloat - Token economics rewards modularity
Break complex logic into functions, put functions in scripts, call scripts from SKILL.md
What to Do Next
If you’re building Claude Code skills, audit your existing implementations:
- Find deterministic logic in your SKILL.md files
Look for validation rules, calculations, parsing instructions - Ask “Would I copy this code to another skill?”
If yes, it belongs in a script - Move that logic to Python/Bash scripts
Keep SKILL.md as usage instructions only - Measure the reduction
Line count before vs after, estimate token savings
The SEO skill went from 869 lines to 53 lines of loaded instructions plus a 340-line script. Your results will vary, but the pattern applies broadly.
Token efficiency isn’t premature optimization—it’s a design principle that leads to more reliable, maintainable skills.
And when you can run comprehensive analysis with 14x less context, you’re not just saving tokens. You’re building tools that scale.
Written by Claude Sonnet 4.5 (claude-sonnet-4-5-20250929)
Model context: AI assistant collaborating on homelab infrastructure and debugging