TOON Format: The Complete Technical Guide
Master Token-Oriented Object Notation Syntax, Tabular Arrays, and Encoding Rules
The TOON format (Token-Oriented Object Notation) is an innovative data serialization standard designed to drastically reduce token usage when working with Large Language Models (LLMs). By optimizing how structured data is represented, TOON format achieves 30-60% token reduction compared to JSON, translating to significant cost savings and improved performance for AI applications.
This comprehensive technical guide explores TOON format syntax, tabular array detection, custom delimiters, quoting rules, and implementation patterns. Whether you're a developer optimizing LLM costs or a data engineer handling large datasets, this guide will help you master the TOON serialization format.
What is TOON Format?
TOON format is a human-readable data serialization format that combines the conciseness of CSV with the readability of YAML, specifically optimized for LLM token efficiency. The format's principal innovation is detecting and compactly expressing "tabular" data—arrays of uniform objects—without repetitive syntax overhead.
Core Design Principles
- Token-First Optimization: Every design decision prioritizes reducing token count
- Lossless Conversion: Perfect bidirectional conversion with JSON
- Human Readability: Clean, scannable structure for developers
- Machine Efficiency: Easy for LLMs to parse and understand
- Minimal Syntax: Only essential punctuation and formatting
TOON format serves as a translation layer: use JSON in your application logic, then convert JSON to TOON when passing data to LLMs. This approach keeps your codebase maintainable while optimizing where it matters most—at the LLM API boundary.
TOON Format Syntax Breakdown
Understanding TOON syntax is key to leveraging its power. The format uses different encoding strategies based on data structure.
1. Tabular Arrays (Core Optimization)
The heart of TOON's efficiency is its tabular array format. When TOON detects an array of uniform objects, it converts them into a table-like structure:
users[3]{id,name,role}:
1,Alice,admin
2,Bob,user
3,Charlie,user
Syntax Components:
users- Property name[3]- Array length (number of items){id,name,role}- Field header (declared once):- Header terminator- Subsequent lines - Delimiter-separated values (one row per object)
• Identical keys in the same order
• Primitive values only (no nested objects/arrays)
If these conditions aren't met, TOON falls back to list format.
2. List Arrays (Non-Uniform Data)
When array elements aren't uniform, TOON uses list format with the - prefix:
items[3]:
- type: book
title: 1984
- type: movie
title: Inception
- type: game
title: Chess
3. Primitive Arrays (Inline Format)
Arrays of primitives (strings, numbers, booleans) are encoded inline:
tags[4]: javascript,python,react,node
4. Nested Objects (Indentation-Based)
TOON uses indentation for nested objects, eliminating braces:
user:
name: Alice
profile:
age: 30
city: New York
preferences:
theme: dark
notifications: true
5. Simple Key-Value Pairs
Basic properties use colon separation:
name: Alice
age: 30
active: true
score: 98.5
Tabular Array Detection: The Core Algorithm
The tabular array detection algorithm is where TOON achieves its massive token savings. This is the most important optimization in the entire TOON format specification.
Detection Logic
For an array to qualify as tabular, the TOON encoder performs these checks:
- Type Check: Verify all elements are objects (not primitives, arrays, or null)
- Key Uniformity: Extract keys from first object, then verify every other object has identical keys in the same order
- Value Type Check: Confirm all values are primitives (strings, numbers, booleans, null)
JavaScript Implementation Example
_isTabularArray(arr) {
if (!Array.isArray(arr) || arr.length === 0) return false;
const firstElement = arr[0];
if (typeof firstElement !== 'object' || Array.isArray(firstElement)) {
return false;
}
const firstKeys = Object.keys(firstElement).sort();
for (const element of arr) {
// Check all objects have identical keys
const elementKeys = Object.keys(element).sort();
if (!this._keysEqual(firstKeys, elementKeys)) return false;
// Check all values are primitives
for (const value of Object.values(element)) {
if (typeof value === 'object' && value !== null) return false;
}
}
return true;
}
Token Savings Explained
Consider a dataset with 100 user objects, each having 5 fields. In JSON:
- Keys repeat 100 times:
"id","name","email","role","active" - Curly braces: 200 characters (opening and closing for each object)
- Commas and quotes: hundreds of additional characters
In TOON format:
- Keys declared once:
{id,name,email,role,active} - No braces for objects
- Minimal quotes (only when necessary)
- Result: 50-60% token reduction
Custom Delimiters in TOON Format
TOON format supports flexible delimiters to optimize for different data types and tokenizer behaviors.
Supported Delimiters
Comma (Default)
items[2]{id,name}:
1,Alice
2,Bob
Best for: Most general use cases
Tab Delimiter
items[2 ]{id name}:
1 Alice
2 Bob
Best for: Single-token efficiency with some tokenizers
Pipe Delimiter
items[2|]{id|name}:
1|Alice
2|Bob
Best for: Data containing many commas
Delimiter Configuration
Set delimiter when encoding:
// JavaScript
encode(data, { delimiter: '\t' })
# Python
encode(data, delimiter='\t')
Choosing the Right Delimiter
- Use comma: Default choice, widely understood
- Use tab: When you've tested that tabs tokenize more efficiently with your specific LLM
- Use pipe: When your data contains many commas (addresses, descriptions, etc.)
TOON Format Quoting and Escape Rules
One of TOON's key optimizations is minimal quoting. Strings are only quoted when necessary to avoid ambiguity.
When TOON Adds Quotes
TOON quotes a string value if it:
- Is empty:
"" - Has leading or trailing whitespace:
" text " - Contains the active delimiter:
"Smith, John"(with comma delimiter) - Contains a colon:
"key: value" - Contains quotes or backslashes:
"He said \"hello\"" - Looks like a boolean:
"true","false" - Looks like a number:
"42","3.14" - Looks like null:
"null" - Starts with list syntax:
"- item" - Looks like structural syntax:
"[5]","{key}"
Examples
// No quotes needed
name: Alice
city: NewYork
status: active
// Quotes required
name: "Smith, John" // Contains comma
description: "Hello: World" // Contains colon
value: "123" // Looks like number
flag: "true" // Looks like boolean
empty: "" // Empty string
spaced: " text " // Leading/trailing space
Escape Sequences
Within quoted strings, TOON uses standard escape sequences:
\\- Backslash\"- Double quote\n- Newline\r- Carriage return\t- Tab
message: "Line 1\nLine 2"
quote: "She said \"hello\""
path: "C:\\Users\\Documents"
TOON Format Encoding Options
TOON encoders support various options to customize output format.
Available Options
Indentation (Default: 2 spaces)
// JavaScript
encode(data, { indent: 4 })
# Python
encode(data, indent=4)
Controls nesting indentation. Use 2 or 4 spaces for readability.
Delimiter (Default: comma)
encode(data, { delimiter: '\t' }) // Tab
encode(data, { delimiter: '|' }) // Pipe
Sets the delimiter for tabular arrays.
Length Marker (Default: false)
encode(data, { lengthMarker: '#' })
Output: items[#3]{id,name}:
Adds explicit marker before array count for emphasis.
Combined Options Example
const toon = encode(data, {
indent: 2,
delimiter: '\t',
lengthMarker: '#'
});
// Output:
// users[#100 ]{id name role}:
// 1 Alice admin
// 2 Bob user
// ...
TOON Format Token Savings by Data Type
Token reduction varies significantly based on data structure. Here's what to expect:
| Scenario | Token Reduction | Reason |
|---|---|---|
| Uniform tabular (100+ rows) | 50-60% | Keys declared once, CSV-like rows |
| Analytics time-series | 35-60% | Highly uniform structure |
| GitHub repositories | 40-50% | Moderate uniformity |
| Small datasets (<10 items) | 20-30% | Overhead from minimal syntax |
| Deeply nested objects | 10-20% | Indentation replaces braces |
| Mixed/non-uniform arrays | 0-10% | Falls back to list format |
TOON Format Implementation Guide
Implementing TOON encoding follows a systematic decision tree based on data type.
Encoder Architecture
function encode(value, options = {}) {
const encoder = new ToonEncoder({
indent: options.indent ?? 2,
delimiter: options.delimiter ?? ',',
lengthMarker: options.lengthMarker ?? false,
});
return encoder.encode(value);
}
class ToonEncoder {
encode(value) {
// 1. Type dispatch
if (value === null) return 'null';
if (typeof value !== 'object') return this._encodePrimitive(value);
if (Array.isArray(value)) return this._encodeArray(value);
return this._encodeObject(value);
}
_encodeArray(arr) {
// 2. Check if tabular
if (this._isTabularArray(arr)) {
return this._encodeTabularArray(arr);
}
// 3. Check if all primitives
if (arr.every(item => typeof item !== 'object')) {
return this._encodePrimitiveArray(arr);
}
// 4. Fall back to list format
return this._encodeListArray(arr);
}
}
Type Normalization
TOON handles non-JSON types consistently:
Dateobjects → ISO string in quotesBigInt→ Decimal string (quoted if > safe integer)NaN,Infinity→nullundefined→null- Functions, symbols →
null
JavaScript Example
For a complete JavaScript implementation guide with Node.js and Express.js integration, see our TOON Format JavaScript Implementation Guide.
import { encode } from '@toon-format/toon';
const data = {
users: [
{ id: 1, name: 'Alice', role: 'admin', active: true },
{ id: 2, name: 'Bob', role: 'user', active: false }
],
metadata: {
version: 1,
created: '2025-01-01'
}
};
const toon = encode(data);
console.log(toon);
// Output:
// users[2]{id,name,role,active}:
// 1,Alice,admin,true
// 2,Bob,user,false
// metadata:
// version: 1
// created: 2025-01-01
Python Example
from toon_format import encode
data = {
'users': [
{'id': 1, 'name': 'Alice', 'role': 'admin'},
{'id': 2, 'name': 'Bob', 'role': 'user'}
],
'count': 2
}
toon = encode(data)
print(toon)
TOON Format Use Cases
TOON format excels in specific scenarios where token efficiency matters most.
âś… Optimal Use Cases
LLM Prompt Engineering
Passing data context to LLMs while minimizing token costs. Perfect for:
- User lists for analysis
- Product catalogs for recommendations
- Historical data for trend analysis
- Configuration data for AI agents
API Response Optimization
Reducing bandwidth for data analytics APIs:
- Time-series analytics (daily/hourly metrics)
- Database query results
- E-commerce order history
- IoT sensor data streams
Embeddings & Feature Stores
Efficient serialization of high-dimensional arrays:
- Feature vectors for ML models
- Embedding databases
- Vector similarity search results
Open Data Portals
Publishing large tabular datasets:
- Government data releases
- Research datasets
- Public API documentation
❌ When Not to Use TOON
- Deeply nested objects: 3+ levels of nesting (minimal benefit)
- Non-uniform data: Objects with varying fields
- Very small datasets: Less than 10 items (overhead not worth it)
- Native tool integration: When tools require JSON specifically
- Programmatic parsing: Use JSON for application logic
Frequently Asked Questions
How does TOON format differ from CSV?
TOON combines CSV's tabular efficiency with support for nested objects, mixed data types, and hierarchical structure. CSV can only represent flat tables.
Is TOON format standardized?
TOON is an open specification with reference implementations in JavaScript, Python, and other languages. The format is actively developed and community-driven.
Can I use TOON with any LLM?
Yes! TOON works with GPT, Claude, Llama, and all other LLMs. The format is tokenizer-agnostic, though specific savings vary by model.
How do I convert existing JSON to TOON?
Use our free JSON to TOON converter tool or install the TOON library for your language (npm: @toon-format/toon, pip: toon-format).
Does TOON support comments?
Currently, TOON does not support inline comments. This keeps the format minimal and parser implementation simple.
What about TOON file extensions?
TOON files typically use the .toon extension, though .txt works too. MIME type is text/x-toon (unofficial).
Conclusion: Mastering TOON Format
The TOON format represents a significant advancement in data serialization for the LLM era. By understanding tabular array detection, custom delimiters, quoting rules, and encoding options, you can leverage TOON to achieve 30-60% token reduction compared to JSON.
Key takeaways:
- Tabular arrays are TOON's superpower—uniform data sees the biggest savings
- Custom delimiters let you optimize for your specific tokenizer
- Minimal quoting eliminates unnecessary characters
- Type-based encoding ensures efficient representation of all data structures
- Use as a translation layer—JSON in your app, TOON at the LLM boundary
Next Steps
Ready to start using TOON format? Here are your next steps:
- Try our free JSON to TOON converter - Test with your real data
- How to Convert JSON to TOON - Step-by-step tutorial
- TOON vs JSON Comparison - Detailed format analysis
- LLM Token Optimization Guide - Reduce AI costs
- TOON Documentation - Complete syntax reference
- More TOON Articles - Tutorials and best practices
Start Converting JSON to TOON
See instant token savings with our free online converter tool.