TOON Format for Claude API: Anthropic LLM Optimization
Achieve 85-90% cost reduction with TOON format + prompt caching for Claude Sonnet and Opus
Claude by Anthropic has rapidly become a preferred choice for developers building sophisticated AI applications. Its reasoning capabilities, safety features, and competitive pricing make it increasingly popular. However, like all LLM APIs, Claude charges by the tokenβand there's significant room for cost optimization.
This comprehensive guide reveals how to combine TOON format with Claude's prompt caching feature to achieve unprecedented cost savings of up to 90%. You'll learn exactly how to convert JSON to TOON, integrate with Anthropic's Python SDK, leverage prompt caching, and calculate real savings for your specific Claude use case.
- β Reduce Claude API costs by 30-60% using TOON format
- β Achieve additional 50-90% savings with prompt caching
- β Total potential savings: 85-95% on input tokens
- β Improve accuracy by 4-7% with structured TOON format
- β Production-ready integration with Claude Sonnet 4.5 and Opus 4
- β Real case study: $8,165 annual savings from one implementation
Understanding Claude API Pricing
Anthropic prices Claude models per-token with distinct input/output costs. Understanding this structure is critical for optimization.
Claude Pricing Structure (November 2025)
| Model | Input Cost | Output Cost | Context | Best For |
|---|---|---|---|---|
| Claude 3.5 Haiku | $0.80/1M | $4.00/1M | 200K | Fast, lightweight tasks |
| Claude 3.5 Sonnet | $3.00/1M | $15.00/1M | 200K | Balanced performance/cost |
| Claude 4 Sonnet | $3.00/1M | $15.00/1M | 200K | Latest balanced model |
| Claude 3 Opus | $15.00/1M | $75.00/1M | 200K | Maximum reasoning |
| Claude 4 Opus | $15.00/1M | $75.00/1M | 200K | Latest flagship |
Key distinction: Output tokens cost 5x more than input tokens on Sonnet. This means optimizing input tokens is critical for cost reduction.
The JSON Problem: Token Waste at Scale
When you send data to Claude in JSON format, you're paying for massive redundancy. Consider this example of customer support tickets:
{
"analysis_task": "Classify customer support tickets by priority",
"tickets": [
{ "id": "TKT-001", "subject": "Login issues", "priority": "high", "status": "open" },
{ "id": "TKT-002", "subject": "Billing question", "priority": "low", "status": "open" },
{ "id": "TKT-003", "subject": "Feature request", "priority": "medium", "status": "assigned" },
{ "id": "TKT-004", "subject": "Bug in checkout", "priority": "high", "status": "in_progress" },
{ "id": "TKT-005", "subject": "Password reset", "priority": "low", "status": "open" }
]
}
Token count: 287 tokens (using Claude's tokenizer)
The field namesβ"id", "subject", "priority", "status"βrepeat 5 times each. That's 40+ tokens wasted on repetition.
The Solution: TOON + Prompt Caching = 90% Cost Reduction
Part 1: TOON Format Reduces Token Usage by 50-60%
Convert the same data to TOON format:
Analysis task: Classify customer support tickets by priority
tickets[5]{id,subject,priority,status}:
TKT-001,Login issues,high,open
TKT-002,Billing question,low,open
TKT-003,Feature request,medium,assigned
TKT-004,Bug in checkout,high,in_progress
TKT-005,Password reset,low,open
Token count: 115 tokens β a 60% reduction!
Learn more about what is TOON format and how it works.
Part 2: Prompt Caching Saves Additional 50-90% on System Prompts
Anthropic's prompt caching stores and reuses unchanged prompt segments. When combined with TOON, the savings are extraordinary:
Without caching:
- System prompt (3,500 tokens) Γ $3/1M = $0.0105
- Query (100 tokens) Γ $3/1M = $0.0003
- Total = $0.0108 per request
With prompt caching (subsequent calls):
- Cache read (3,500 tokens) Γ $0.30/1M = $0.00105
- Query (100 tokens) Γ $3/1M = $0.0003
- Total = $0.00135 per request
Savings: From $0.0108 to $0.00135 = 87.5% reduction on system prompt tokens
Real-World Case Study: Customer Support Automation
The Scenario
A SaaS company uses Claude to automatically classify, route, and draft responses for 10,000 customer support tickets daily.
Application architecture:
- 3,500-token system prompt (classification rules, response guidelines, company policies)
- 50-100 tickets processed per batch
- 5 tickets per API call average
- Average output: 150 tokens (draft response + metadata)
- Daily API calls: 2,000
JSON Approach (Before Optimization)
Per API call:
- System prompt: 3,500 tokens
- Data: 450 tokens (5 tickets in JSON)
- Total input: 3,950 tokens
- Output: 150 tokens
Cost per call: (3,950 Γ $3 + 150 Γ $15) Γ· 1,000,000 = $0.01425
| Period | Cost |
|---|---|
| Daily | 2,000 calls Γ $0.01425 = $28.50/day |
| Monthly | $28.50 Γ 30 = $855/month |
| Annual | $855 Γ 12 = $10,260/year |
TOON Format Approach (After Optimization)
Using TOON format with prompt caching:
Process and classify these support tickets:
tickets[5]{id,subject,body,priority}:
TKT-1,Login error,"Can't access account...",high
TKT-2,Billing question,"Why was I charged twice?",low
TKT-3,Feature request,"Can we add dark mode?",medium
TKT-4,Bug report,"Checkout button broken",high
TKT-5,Account deletion,"Please delete my account",low
Per API call:
- System prompt: 3,500 tokens (cached, read cost $0.30/1M)
- Data in TOON: 180 tokens (5 tickets) β 60% reduction
- Total input: 3,680 tokens
- Output: 150 tokens
Cost per call: (3,500 Γ $0.30 + 180 Γ $3 + 150 Γ $15) Γ· 1,000,000 = $0.00291
The Results: Real Savings
| Metric | JSON (No Caching) | TOON + Caching | Savings |
|---|---|---|---|
| Cost per call | $0.01425 | $0.00291 | 79.6% |
| Daily cost | $28.50 | $5.82 | 79.6% |
| Monthly cost | $855 | $174.60 | 79.6% |
| Annual cost | $10,260 | $2,095 | $8,165 |
Integration Guide: TOON + Claude + Prompt Caching
Step 1: Install Required Packages
pip install anthropic toon-format
Step 2: Basic Claude + TOON Integration
from anthropic import Anthropic
from toon_format import encode
client = Anthropic()
# Your data
tickets = [
{"id": "TKT-001", "subject": "Login error", "priority": "high"},
{"id": "TKT-002", "subject": "Billing question", "priority": "low"},
{"id": "TKT-003", "subject": "Feature request", "priority": "medium"}
]
# Convert to TOON
toon_data = encode({"tickets": tickets}, indent=1)
# Create prompt with TOON data
prompt = f"""Analyze and classify these support tickets:
{toon_data}
For each ticket, provide:
1. Severity assessment
2. Routing recommendation
3. Draft response"""
# Call Claude
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[
{"role": "user", "content": prompt}
]
)
print(response.content[0].text)
Step 3: Adding Prompt Caching for Maximum Savings
Prompt caching requires specifying cache_control on text blocks. Here's the production approach:
from anthropic import Anthropic
from toon_format import encode
client = Anthropic()
# Your large, reusable system instructions
SYSTEM_INSTRUCTIONS = """You are a customer support automation system. Your responsibilities:
1. CLASSIFICATION: Categorize tickets by severity (critical, high, medium, low)
2. ROUTING: Determine if human intervention is needed
3. RESPONSE: Draft professional, empathetic responses
Classification Rules:
- Critical: System down, data loss, security breach
- High: Feature broken, unable to complete core task
- Medium: Non-critical feature issue, minor bugs
- Low: Questions, feature requests, documentation
Always be professional, empathetic, and solution-focused."""
# Data to analyze (will be in TOON format)
tickets = [
{"id": "TKT-001", "subject": "Cannot log in", "priority": "critical"},
{"id": "TKT-002", "subject": "Payment failed", "priority": "high"},
{"id": "TKT-003", "subject": "Question about features", "priority": "low"}
]
toon_data = encode({"tickets": tickets}, indent=1)
# Use prompt caching with cache_control
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
system=[
{
"type": "text",
"text": SYSTEM_INSTRUCTIONS,
"cache_control": {"type": "ephemeral"} # Cache this system prompt
}
],
messages=[
{
"role": "user",
"content": f"Process these tickets:\n\n{toon_data}"
}
]
)
# Check cache usage
usage = response.usage
print(f"Input tokens: {usage.input_tokens}")
print(f"Cache creation tokens: {getattr(usage, 'cache_creation_input_tokens', 0)}")
print(f"Cache read tokens: {getattr(usage, 'cache_read_input_tokens', 0)}")
Step 4: Batch Processing with TOON + Caching
For maximum efficiency with multiple requests:
from anthropic import Anthropic
from toon_format import encode
client = Anthropic()
# System instructions (cached on first call, then reused)
SYSTEM_INSTRUCTIONS = """[Your 3,500+ token system prompt]"""
class ClaudeChatbot:
def __init__(self):
self.total_cost = 0.0
def process_tickets(self, tickets):
"""Process batch of tickets with cached system prompt."""
# Convert to TOON
toon_data = encode({"tickets": tickets}, indent=1)
# Call Claude with caching
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
system=[
{
"type": "text",
"text": SYSTEM_INSTRUCTIONS,
"cache_control": {"type": "ephemeral"}
}
],
messages=[
{
"role": "user",
"content": f"Process these tickets:\n\n{toon_data}"
}
]
)
# Track costs
usage = response.usage
input_tokens = usage.input_tokens
cache_read = getattr(usage, 'cache_read_input_tokens', 0)
output_tokens = usage.output_tokens
# Cost calculation (Claude Sonnet)
input_cost = (input_tokens * 3 / 1_000_000)
cache_cost = (cache_read * 0.30 / 1_000_000)
output_cost = (output_tokens * 15 / 1_000_000)
total_cost = input_cost + cache_cost + output_cost
self.total_cost += total_cost
print(f"Cost: ${total_cost:.6f}")
print(f"Cache read tokens: {cache_read}")
return response.content[0].text
def report(self):
print(f"\nTotal cost so far: ${self.total_cost:.2f}")
# Usage
bot = ClaudeChatbot()
# First request (cache miss, system prompt is cached)
tickets_batch1 = [
{"id": "T1", "subject": "Login error", "priority": "high"},
{"id": "T2", "subject": "Billing", "priority": "low"}
]
result1 = bot.process_tickets(tickets_batch1)
# Second request (cache hit, system prompt reused)
tickets_batch2 = [
{"id": "T3", "subject": "Feature request", "priority": "medium"},
{"id": "T4", "subject": "Bug report", "priority": "high"}
]
result2 = bot.process_tickets(tickets_batch2)
bot.report()
Before & After: Real Prompting Examples
Example 1: Content Analysis with Historical Context
Before (JSON):
{
"task": "Analyze blog performance",
"articles": [
{ "id": 1, "title": "Getting Started", "views": 5420, "engagement": 0.35 },
{ "id": 2, "title": "Advanced Tips", "views": 3120, "engagement": 0.52 },
{ "id": 3, "title": "Best Practices", "views": 8934, "engagement": 0.48 }
]
}
Tokens: 234
After (TOON):
Analyze blog performance.
articles[3]{id,title,views,engagement}:
1,Getting Started,5420,0.35
2,Advanced Tips,3120,0.52
3,Best Practices,8934,0.48
Tokens: 95 β 59% reduction
See more comparison examples in our TOON vs JSON article.
Example 2: Multi-Turn Conversation with Context Caching
First call (cache setup):
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
system=[
{
"type": "text",
"text": "You are a data analyst helping with business intelligence...",
"cache_control": {"type": "ephemeral"}
},
{
"type": "text",
"text": f"Historical data:\n{toon_encoded_historical_data}",
"cache_control": {"type": "ephemeral"}
}
],
messages=[
{"role": "user", "content": "What's the trend?"}
]
)
Subsequent calls (cache reuse):
# Same system + cached data, different query
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
system=[
{
"type": "text",
"text": "You are a data analyst...",
"cache_control": {"type": "ephemeral"}
},
{
"type": "text",
"text": f"Historical data:\n{toon_encoded_historical_data}",
"cache_control": {"type": "ephemeral"}
}
],
messages=[
{"role": "user", "content": "What's the forecast?"} # Different query
]
)
# 90% savings on system + cached data tokens!
Frequently Asked Questions
Does TOON work with all Claude models?
Yes, absolutely. TOON works with all Claude models:
- β Claude 3.5 Haiku
- β Claude 3.5 Sonnet (recommended for most use cases)
- β Claude 4 Sonnet
- β Claude 3 Opus
- β Claude 4 Opus (highest reasoning)
Accuracy with TOON is higher than JSON (73.9% vs 69.7% on data retrieval tasks).
How does prompt caching work exactly?
Prompt caching stores prompt segments on Anthropic's servers:
- First call: You include
cache_controlparameter on text blocks. Cost: $3.75/1M tokens (slightly higher to write cache). Cached portion is stored for 5 minutes. - Subsequent calls: Reference same cached content. Cost: $0.30/1M tokens (90% discount). Cache hit requires identical content match.
- Cache expiry: Resets every 5 minutes of inactivity.
Can I cache TOON data specifically?
Yes! Wrap TOON-formatted data in cache_control:
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": f"Analyze this data:\n\n{toon_data}",
"cache_control": {"type": "ephemeral"}
}
]
}
]
)
Since TOON is more concise, caching TOON data saves more money than caching JSON.
What about accuracy? Does TOON affect Claude's performance?
Noβaccuracy improves with TOON:
| Format | Claude Accuracy |
|---|---|
| TOON | 73.9% |
| JSON (compact) | 70.7% |
| JSON (formatted) | 69.7% |
| YAML | 69.0% |
TOON's explicit structure (array lengths, field headers) helps Claude parse data more reliably.
How many tokens must I cache minimum?
Anthropic requires minimum 1,024 tokens to cache. Smaller prompts cannot be cached.
Recommended Implementation Order
Week 1: Test & Validate
- β Install
toon-formatpackage - β Convert 5 existing Claude prompts to TOON
- β Measure token count reduction (target: 50-60%)
- β Verify Claude accuracy (should improve or match)
- β Calculate potential annual savings
Week 2: Add Caching
- β Identify reusable system prompts (3,000+ tokens)
- β Implement
cache_controlon system prompts - β Test cache hits (verify
cache_read_input_tokens > 0) - β Measure combined TOON + caching savings (target: 80-90%)
Week 3-4: Deploy Gradually
- β Update 10% of production requests to TOON + caching
- β Monitor costs in Anthropic dashboard
- β Track accuracy metrics
- β Scale to 100% as confidence builds
Month 2+: Optimize
- β Apply Batch API for non-urgent requests (additional 50% off)
- β Fine-tune TOON delimiters (tab vs comma)
- β Optimize system prompt with keyword folding
- β Monitor cache hit rates
Conclusion: Claude + TOON + Caching = Unbeatable Savings
The combination of TOON format and Anthropic's prompt caching is the most powerful cost optimization available for Claude API users:
- β TOON: 50-60% token reduction through format optimization
- β Prompt Caching: 90% savings on cached tokens
- β Combined: 85-95% total input token cost reduction
- β Batch API: Additional 50% discount for non-urgent work
- β Multiple benefits: Faster responses + improved accuracy + massive cost savings
Quick Start Checklist
- Install:
pip install anthropic toon-format - Test: Convert one prompt to TOON + add caching
- Measure: Compare before/after token costs
- Deploy: Roll out to your highest-volume use cases
- Monitor: Track savings in Anthropic console
Real-World Impact
For organizations using Claude API:
- Low volume (< 1,000 calls/day): $50-200/month savings
- Medium volume (1,000-10,000 calls/day): $500-5,000/month savings
- High volume (10,000+ calls/day): $5,000-50,000+/month savings
Conservative estimate: Most organizations save $1,000-10,000+ annually by implementing TOON + caching.
Next Steps
Ready to optimize your Claude API costs? Here are some helpful resources:
- Try our free JSON to TOON converter tool - See the token savings instantly
- What is TOON Format? - Learn the basics of TOON
- TOON Format for Python - Python implementation guide
- TOON Format for ChatGPT - OpenAI optimization guide
- How to Convert JSON to TOON - Step-by-step conversion guide
- TOON Documentation - Complete syntax reference
- More TOON Articles - Explore our blog for tips and tutorials