TOON Format for Claude API: Anthropic LLM Optimization

Achieve 85-90% cost reduction with TOON format + prompt caching for Claude Sonnet and Opus

Claude by Anthropic has rapidly become a preferred choice for developers building sophisticated AI applications. Its reasoning capabilities, safety features, and competitive pricing make it increasingly popular. However, like all LLM APIs, Claude charges by the tokenβ€”and there's significant room for cost optimization.

This comprehensive guide reveals how to combine TOON format with Claude's prompt caching feature to achieve unprecedented cost savings of up to 90%. You'll learn exactly how to convert JSON to TOON, integrate with Anthropic's Python SDK, leverage prompt caching, and calculate real savings for your specific Claude use case.

πŸ“Š Key Outcomes
  • βœ… Reduce Claude API costs by 30-60% using TOON format
  • βœ… Achieve additional 50-90% savings with prompt caching
  • βœ… Total potential savings: 85-95% on input tokens
  • βœ… Improve accuracy by 4-7% with structured TOON format
  • βœ… Production-ready integration with Claude Sonnet 4.5 and Opus 4
  • βœ… Real case study: $8,165 annual savings from one implementation

Understanding Claude API Pricing

Anthropic prices Claude models per-token with distinct input/output costs. Understanding this structure is critical for optimization.

Claude Pricing Structure (November 2025)

Model Input Cost Output Cost Context Best For
Claude 3.5 Haiku $0.80/1M $4.00/1M 200K Fast, lightweight tasks
Claude 3.5 Sonnet $3.00/1M $15.00/1M 200K Balanced performance/cost
Claude 4 Sonnet $3.00/1M $15.00/1M 200K Latest balanced model
Claude 3 Opus $15.00/1M $75.00/1M 200K Maximum reasoning
Claude 4 Opus $15.00/1M $75.00/1M 200K Latest flagship

Key distinction: Output tokens cost 5x more than input tokens on Sonnet. This means optimizing input tokens is critical for cost reduction.

The JSON Problem: Token Waste at Scale

When you send data to Claude in JSON format, you're paying for massive redundancy. Consider this example of customer support tickets:

{
  "analysis_task": "Classify customer support tickets by priority",
  "tickets": [
    { "id": "TKT-001", "subject": "Login issues", "priority": "high", "status": "open" },
    { "id": "TKT-002", "subject": "Billing question", "priority": "low", "status": "open" },
    { "id": "TKT-003", "subject": "Feature request", "priority": "medium", "status": "assigned" },
    { "id": "TKT-004", "subject": "Bug in checkout", "priority": "high", "status": "in_progress" },
    { "id": "TKT-005", "subject": "Password reset", "priority": "low", "status": "open" }
  ]
}

Token count: 287 tokens (using Claude's tokenizer)

The field namesβ€”"id", "subject", "priority", "status"β€”repeat 5 times each. That's 40+ tokens wasted on repetition.

The Solution: TOON + Prompt Caching = 90% Cost Reduction

Part 1: TOON Format Reduces Token Usage by 50-60%

Convert the same data to TOON format:

Analysis task: Classify customer support tickets by priority

tickets[5]{id,subject,priority,status}:
  TKT-001,Login issues,high,open
  TKT-002,Billing question,low,open
  TKT-003,Feature request,medium,assigned
  TKT-004,Bug in checkout,high,in_progress
  TKT-005,Password reset,low,open

Token count: 115 tokens β€” a 60% reduction!

Learn more about what is TOON format and how it works.

Part 2: Prompt Caching Saves Additional 50-90% on System Prompts

Anthropic's prompt caching stores and reuses unchanged prompt segments. When combined with TOON, the savings are extraordinary:

πŸ’‘ Prompt Caching Explained

Without caching:

  • System prompt (3,500 tokens) Γ— $3/1M = $0.0105
  • Query (100 tokens) Γ— $3/1M = $0.0003
  • Total = $0.0108 per request

With prompt caching (subsequent calls):

  • Cache read (3,500 tokens) Γ— $0.30/1M = $0.00105
  • Query (100 tokens) Γ— $3/1M = $0.0003
  • Total = $0.00135 per request

Savings: From $0.0108 to $0.00135 = 87.5% reduction on system prompt tokens

Real-World Case Study: Customer Support Automation

The Scenario

A SaaS company uses Claude to automatically classify, route, and draft responses for 10,000 customer support tickets daily.

Application architecture:

  • 3,500-token system prompt (classification rules, response guidelines, company policies)
  • 50-100 tickets processed per batch
  • 5 tickets per API call average
  • Average output: 150 tokens (draft response + metadata)
  • Daily API calls: 2,000

JSON Approach (Before Optimization)

Per API call:

  • System prompt: 3,500 tokens
  • Data: 450 tokens (5 tickets in JSON)
  • Total input: 3,950 tokens
  • Output: 150 tokens

Cost per call: (3,950 Γ— $3 + 150 Γ— $15) Γ· 1,000,000 = $0.01425

Period Cost
Daily 2,000 calls Γ— $0.01425 = $28.50/day
Monthly $28.50 Γ— 30 = $855/month
Annual $855 Γ— 12 = $10,260/year

TOON Format Approach (After Optimization)

Using TOON format with prompt caching:

Process and classify these support tickets:

tickets[5]{id,subject,body,priority}:
  TKT-1,Login error,"Can't access account...",high
  TKT-2,Billing question,"Why was I charged twice?",low
  TKT-3,Feature request,"Can we add dark mode?",medium
  TKT-4,Bug report,"Checkout button broken",high
  TKT-5,Account deletion,"Please delete my account",low

Per API call:

  • System prompt: 3,500 tokens (cached, read cost $0.30/1M)
  • Data in TOON: 180 tokens (5 tickets) β€” 60% reduction
  • Total input: 3,680 tokens
  • Output: 150 tokens

Cost per call: (3,500 Γ— $0.30 + 180 Γ— $3 + 150 Γ— $15) Γ· 1,000,000 = $0.00291

The Results: Real Savings

Metric JSON (No Caching) TOON + Caching Savings
Cost per call $0.01425 $0.00291 79.6%
Daily cost $28.50 $5.82 79.6%
Monthly cost $855 $174.60 79.6%
Annual cost $10,260 $2,095 $8,165
πŸ’° Annual Savings
This single implementation saves $8,165/year with no loss of quality. Try our JSON to TOON converter to calculate your potential savings.

Integration Guide: TOON + Claude + Prompt Caching

Step 1: Install Required Packages

pip install anthropic toon-format

Step 2: Basic Claude + TOON Integration

from anthropic import Anthropic
from toon_format import encode

client = Anthropic()

# Your data
tickets = [
    {"id": "TKT-001", "subject": "Login error", "priority": "high"},
    {"id": "TKT-002", "subject": "Billing question", "priority": "low"},
    {"id": "TKT-003", "subject": "Feature request", "priority": "medium"}
]

# Convert to TOON
toon_data = encode({"tickets": tickets}, indent=1)

# Create prompt with TOON data
prompt = f"""Analyze and classify these support tickets:

{toon_data}

For each ticket, provide:
1. Severity assessment
2. Routing recommendation
3. Draft response"""

# Call Claude
response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": prompt}
    ]
)

print(response.content[0].text)

Step 3: Adding Prompt Caching for Maximum Savings

Prompt caching requires specifying cache_control on text blocks. Here's the production approach:

from anthropic import Anthropic
from toon_format import encode

client = Anthropic()

# Your large, reusable system instructions
SYSTEM_INSTRUCTIONS = """You are a customer support automation system. Your responsibilities:

1. CLASSIFICATION: Categorize tickets by severity (critical, high, medium, low)
2. ROUTING: Determine if human intervention is needed
3. RESPONSE: Draft professional, empathetic responses

Classification Rules:
- Critical: System down, data loss, security breach
- High: Feature broken, unable to complete core task
- Medium: Non-critical feature issue, minor bugs
- Low: Questions, feature requests, documentation

Always be professional, empathetic, and solution-focused."""

# Data to analyze (will be in TOON format)
tickets = [
    {"id": "TKT-001", "subject": "Cannot log in", "priority": "critical"},
    {"id": "TKT-002", "subject": "Payment failed", "priority": "high"},
    {"id": "TKT-003", "subject": "Question about features", "priority": "low"}
]

toon_data = encode({"tickets": tickets}, indent=1)

# Use prompt caching with cache_control
response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": SYSTEM_INSTRUCTIONS,
            "cache_control": {"type": "ephemeral"}  # Cache this system prompt
        }
    ],
    messages=[
        {
            "role": "user",
            "content": f"Process these tickets:\n\n{toon_data}"
        }
    ]
)

# Check cache usage
usage = response.usage
print(f"Input tokens: {usage.input_tokens}")
print(f"Cache creation tokens: {getattr(usage, 'cache_creation_input_tokens', 0)}")
print(f"Cache read tokens: {getattr(usage, 'cache_read_input_tokens', 0)}")

Step 4: Batch Processing with TOON + Caching

For maximum efficiency with multiple requests:

from anthropic import Anthropic
from toon_format import encode

client = Anthropic()

# System instructions (cached on first call, then reused)
SYSTEM_INSTRUCTIONS = """[Your 3,500+ token system prompt]"""

class ClaudeChatbot:
    def __init__(self):
        self.total_cost = 0.0
    
    def process_tickets(self, tickets):
        """Process batch of tickets with cached system prompt."""
        
        # Convert to TOON
        toon_data = encode({"tickets": tickets}, indent=1)
        
        # Call Claude with caching
        response = client.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=1024,
            system=[
                {
                    "type": "text",
                    "text": SYSTEM_INSTRUCTIONS,
                    "cache_control": {"type": "ephemeral"}
                }
            ],
            messages=[
                {
                    "role": "user",
                    "content": f"Process these tickets:\n\n{toon_data}"
                }
            ]
        )
        
        # Track costs
        usage = response.usage
        input_tokens = usage.input_tokens
        cache_read = getattr(usage, 'cache_read_input_tokens', 0)
        output_tokens = usage.output_tokens
        
        # Cost calculation (Claude Sonnet)
        input_cost = (input_tokens * 3 / 1_000_000)
        cache_cost = (cache_read * 0.30 / 1_000_000)
        output_cost = (output_tokens * 15 / 1_000_000)
        
        total_cost = input_cost + cache_cost + output_cost
        self.total_cost += total_cost
        
        print(f"Cost: ${total_cost:.6f}")
        print(f"Cache read tokens: {cache_read}")
        
        return response.content[0].text
    
    def report(self):
        print(f"\nTotal cost so far: ${self.total_cost:.2f}")

# Usage
bot = ClaudeChatbot()

# First request (cache miss, system prompt is cached)
tickets_batch1 = [
    {"id": "T1", "subject": "Login error", "priority": "high"},
    {"id": "T2", "subject": "Billing", "priority": "low"}
]
result1 = bot.process_tickets(tickets_batch1)

# Second request (cache hit, system prompt reused)
tickets_batch2 = [
    {"id": "T3", "subject": "Feature request", "priority": "medium"},
    {"id": "T4", "subject": "Bug report", "priority": "high"}
]
result2 = bot.process_tickets(tickets_batch2)

bot.report()

Before & After: Real Prompting Examples

Example 1: Content Analysis with Historical Context

Before (JSON):

{
  "task": "Analyze blog performance",
  "articles": [
    { "id": 1, "title": "Getting Started", "views": 5420, "engagement": 0.35 },
    { "id": 2, "title": "Advanced Tips", "views": 3120, "engagement": 0.52 },
    { "id": 3, "title": "Best Practices", "views": 8934, "engagement": 0.48 }
  ]
}

Tokens: 234

After (TOON):

Analyze blog performance.

articles[3]{id,title,views,engagement}:
  1,Getting Started,5420,0.35
  2,Advanced Tips,3120,0.52
  3,Best Practices,8934,0.48

Tokens: 95 β€” 59% reduction

See more comparison examples in our TOON vs JSON article.

Example 2: Multi-Turn Conversation with Context Caching

First call (cache setup):

response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are a data analyst helping with business intelligence...",
            "cache_control": {"type": "ephemeral"}
        },
        {
            "type": "text",
            "text": f"Historical data:\n{toon_encoded_historical_data}",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[
        {"role": "user", "content": "What's the trend?"}
    ]
)

Subsequent calls (cache reuse):

# Same system + cached data, different query
response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are a data analyst...",
            "cache_control": {"type": "ephemeral"}
        },
        {
            "type": "text",
            "text": f"Historical data:\n{toon_encoded_historical_data}",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[
        {"role": "user", "content": "What's the forecast?"}  # Different query
    ]
)
# 90% savings on system + cached data tokens!

Frequently Asked Questions

Does TOON work with all Claude models?

Yes, absolutely. TOON works with all Claude models:

  • βœ… Claude 3.5 Haiku
  • βœ… Claude 3.5 Sonnet (recommended for most use cases)
  • βœ… Claude 4 Sonnet
  • βœ… Claude 3 Opus
  • βœ… Claude 4 Opus (highest reasoning)

Accuracy with TOON is higher than JSON (73.9% vs 69.7% on data retrieval tasks).

How does prompt caching work exactly?

Prompt caching stores prompt segments on Anthropic's servers:

  1. First call: You include cache_control parameter on text blocks. Cost: $3.75/1M tokens (slightly higher to write cache). Cached portion is stored for 5 minutes.
  2. Subsequent calls: Reference same cached content. Cost: $0.30/1M tokens (90% discount). Cache hit requires identical content match.
  3. Cache expiry: Resets every 5 minutes of inactivity.

Can I cache TOON data specifically?

Yes! Wrap TOON-formatted data in cache_control:

response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": f"Analyze this data:\n\n{toon_data}",
                    "cache_control": {"type": "ephemeral"}
                }
            ]
        }
    ]
)

Since TOON is more concise, caching TOON data saves more money than caching JSON.

What about accuracy? Does TOON affect Claude's performance?

Noβ€”accuracy improves with TOON:

Format Claude Accuracy
TOON 73.9%
JSON (compact) 70.7%
JSON (formatted) 69.7%
YAML 69.0%

TOON's explicit structure (array lengths, field headers) helps Claude parse data more reliably.

How many tokens must I cache minimum?

Anthropic requires minimum 1,024 tokens to cache. Smaller prompts cannot be cached.

Recommended Implementation Order

Week 1: Test & Validate

  • ☐ Install toon-format package
  • ☐ Convert 5 existing Claude prompts to TOON
  • ☐ Measure token count reduction (target: 50-60%)
  • ☐ Verify Claude accuracy (should improve or match)
  • ☐ Calculate potential annual savings

Week 2: Add Caching

  • ☐ Identify reusable system prompts (3,000+ tokens)
  • ☐ Implement cache_control on system prompts
  • ☐ Test cache hits (verify cache_read_input_tokens > 0)
  • ☐ Measure combined TOON + caching savings (target: 80-90%)

Week 3-4: Deploy Gradually

  • ☐ Update 10% of production requests to TOON + caching
  • ☐ Monitor costs in Anthropic dashboard
  • ☐ Track accuracy metrics
  • ☐ Scale to 100% as confidence builds

Month 2+: Optimize

  • ☐ Apply Batch API for non-urgent requests (additional 50% off)
  • ☐ Fine-tune TOON delimiters (tab vs comma)
  • ☐ Optimize system prompt with keyword folding
  • ☐ Monitor cache hit rates

Conclusion: Claude + TOON + Caching = Unbeatable Savings

The combination of TOON format and Anthropic's prompt caching is the most powerful cost optimization available for Claude API users:

  • βœ… TOON: 50-60% token reduction through format optimization
  • βœ… Prompt Caching: 90% savings on cached tokens
  • βœ… Combined: 85-95% total input token cost reduction
  • βœ… Batch API: Additional 50% discount for non-urgent work
  • βœ… Multiple benefits: Faster responses + improved accuracy + massive cost savings

Quick Start Checklist

  1. Install: pip install anthropic toon-format
  2. Test: Convert one prompt to TOON + add caching
  3. Measure: Compare before/after token costs
  4. Deploy: Roll out to your highest-volume use cases
  5. Monitor: Track savings in Anthropic console

Real-World Impact

For organizations using Claude API:

  • Low volume (< 1,000 calls/day): $50-200/month savings
  • Medium volume (1,000-10,000 calls/day): $500-5,000/month savings
  • High volume (10,000+ calls/day): $5,000-50,000+/month savings

Conservative estimate: Most organizations save $1,000-10,000+ annually by implementing TOON + caching.

Next Steps

Ready to optimize your Claude API costs? Here are some helpful resources: