What is TOON Format?

A Complete Guide to Token-Efficient Data Serialization for Large Language Models

Token-Oriented Object Notation (TOON) is a compact, human-readable serialization format designed specifically for passing structured data to Large Language Models with significantly reduced token usage. If you're working with LLMs and dealing with large datasets, TOON can help you reduce costs by 30-60% compared to traditional JSON when you convert JSON to TOON.

In this comprehensive guide, we'll explore what TOON format is, why it was created, how the TOON syntax works, and when you should use this efficient TOON serialization format for your LLM applications.

What is TOON?

TOON stands for Token-Oriented Object Notation. It's a data serialization format optimized for LLM input that combines the best aspects of several formats:

  • YAML's indentation-based structure for nested objects (eliminating braces)
  • CSV's tabular format for uniform data rows (declaring fields once)
  • Minimal syntax that removes redundant punctuation

TOON is intended as a translation layer: you use JSON programmatically in your application, then convert to TOON when passing data to an LLM. This keeps your codebase clean while optimizing token usage where it matters most. Try our free JSON to TOON converter to see the efficiency gains instantly.

๐Ÿ’ก Key Insight
TOON's sweet spot is uniform arrays of objects โ€“ multiple fields per row, same structure across items. Think database query results, analytics data, API responses, etc.

Why TOON? The Problem with JSON

As AI becomes more accessible and context windows grow larger, developers are passing more data to LLMs. However, LLM tokens still cost money, and standard JSON is verbose and token-expensive.

The Token Cost Problem

Consider this simple JSON example representing user data:

{
  "users": [
    { "id": 1, "name": "Alice", "role": "admin" },
    { "id": 2, "name": "Bob", "role": "user" },
    { "id": 3, "name": "Charlie", "role": "user" }
  ]
}

Token count: ~125 tokens (GPT-5 o200k_base tokenizer)

Notice the repetition? The keys "id", "name", and "role" appear three times โ€“ once for each user. This redundancy compounds dramatically with larger datasets.

The TOON Solution

TOON conveys the same information with fewer tokens:

users[3]{id,name,role}:
  1,Alice,admin
  2,Bob,user
  3,Charlie,user

Token count: ~54 tokens

Savings: 57% fewer tokens!

TOON declares the field names once in the header ({id,name,role}) and then lists only the values as rows. This pattern becomes even more powerful at scale.

How TOON Works

TOON uses a simple but powerful approach to reduce token usage while maintaining readability and structure.

1. Tabular Arrays (The Core Optimization)

When TOON encounters an array of objects with:

  • Identical keys across all objects
  • Only primitive values (no nested objects/arrays)

It automatically converts it to tabular format:

array_name[count]{field1,field2,field3}:
  value1,value2,value3
  value1,value2,value3

This is where the token savings come from โ€“ keys are declared once instead of repeating for every row.

2. Indentation-Based Structure

For nested objects, TOON uses indentation (like YAML) instead of curly braces:

user:
  name: Alice
  profile:
    age: 30
    city: New York

vs JSON:

{
  "user": {
    "name": "Alice",
    "profile": {
      "age": 30,
      "city": "New York"
    }
  }
}

3. Minimal Quoting

TOON only quotes strings when necessary (when they contain delimiters, colons, or look like numbers/booleans). This eliminates unnecessary quote characters that consume tokens.

Key Features

๐Ÿ’ธ Token-Efficient

Typically 30โ€“60% fewer tokens than JSON, with even greater savings for large uniform datasets.

๐Ÿคฟ LLM-Friendly

Explicit lengths and field declarations enable better validation and parsing by LLMs.

๐Ÿฑ Minimal Syntax

Removes redundant punctuation โ€“ no unnecessary braces, brackets, or quotes.

๐Ÿ“ Human-Readable

Indentation-based structure makes data easy to read and understand at a glance.

๐Ÿงบ Lossless

Drop-in replacement for JSON โ€“ all data converts back perfectly without loss.

โšก Fast Conversion

Lightweight encoder/decoder with minimal overhead โ€“ convert on the fly.

Real-World Benchmarks

Token counts are measured using the GPT-5 o200k_base tokenizer. Savings are calculated against formatted JSON (2-space indentation). Actual savings vary by model and tokenizer.

GitHub Repositories (100 repos)

TOON 8,745 tokens
vs JSON: 15,145 tokens (-42.3%)
vs JSON compact: 11,455 tokens (-23.7%)
vs YAML: 13,129 tokens (-33.4%)

Daily Analytics (365 days)

TOON 4,507 tokens
vs JSON: 10,977 tokens (-58.9%)
vs JSON compact: 7,013 tokens (-35.7%)
vs YAML: 8,810 tokens (-48.8%)
โš ๏ธ Important
These benchmarks use datasets optimized for TOON's strengths (uniform tabular data). Real-world performance depends on your data structure. Deeply nested or non-uniform data may not benefit as much.

When to Use TOON

โœ… TOON is Perfect For:

  • Database Query Results: Rows of uniform data from SQL queries
  • Analytics Data: Time-series metrics, logs, usage statistics
  • API Responses: List of products, users, orders, etc.
  • E-commerce Data: Product catalogs, inventory lists
  • CSV-like Data: Any tabular data with consistent fields
  • High-Volume LLM Calls: When making hundreds/thousands of API requests

โŒ When to Stick with JSON:

  • Deeply Nested Objects: 3+ levels of nesting
  • Non-Uniform Data: Objects with varying fields
  • Small Datasets: Less than 10 items (overhead isn't worth it)
  • Programmatic Use: When you need native language support
  • Tool Compatibility: When working with tools that require JSON
๐Ÿ’ก Best Practice
Use JSON in your application logic, then convert to TOON right before sending data to an LLM. This keeps your codebase maintainable while optimizing token usage.

Getting Started

Online Converter

The easiest way to try TOON is using our free online converter:

JavaScript/Node.js

Learn complete JavaScript implementation including Express.js integration in our JavaScript Implementation Guide.

npm install @toon-format/toon
import { encode, decode } from '@toon-format/toon';

const data = {
  users: [
    { id: 1, name: 'Alice', role: 'admin' },
    { id: 2, name: 'Bob', role: 'user' }
  ]
};

const toon = encode(data);
console.log(toon);

Python

pip install toon-format
from toon_format import encode, decode

data = {
    'users': [
        {'id': 1, 'name': 'Alice', 'role': 'admin'},
        {'id': 2, 'name': 'Bob', 'role': 'user'}
    ]
}

toon = encode(data)
print(toon)

Using with LLMs

Here's how to use TOON in your LLM prompts:

const prompt = `Analyze this user data:

${encode(userData)}

Provide insights on user roles and activity.`;

Conclusion

TOON Format offers a powerful solution for reducing LLM token usage without sacrificing data structure or readability. By optimizing how we represent uniform tabular data, TOON can reduce your token costs by 30-60% compared to JSON.

Whether you're building AI applications, working with analytics data, or making frequent LLM API calls, TOON provides a practical way to optimize costs while maintaining data integrity.

Next Steps

Ready to get started with TOON? Here are some helpful resources: