TOON Format: The Complete Technical Guide

Master Token-Oriented Object Notation Syntax, Tabular Arrays, and Encoding Rules

The TOON format (Token-Oriented Object Notation) is an innovative data serialization standard designed to drastically reduce token usage when working with Large Language Models (LLMs). By optimizing how structured data is represented, TOON format achieves 30-60% token reduction compared to JSON, translating to significant cost savings and improved performance for AI applications.

This comprehensive technical guide explores TOON format syntax, tabular array detection, custom delimiters, quoting rules, and implementation patterns. Whether you're a developer optimizing LLM costs or a data engineer handling large datasets, this guide will help you master the TOON serialization format.

đź’ˇ Quick Start
New to TOON? Try our free JSON to TOON converter to see instant token savings on your data. You can also read our What is TOON Format article for a beginner-friendly introduction.

What is TOON Format?

TOON format is a human-readable data serialization format that combines the conciseness of CSV with the readability of YAML, specifically optimized for LLM token efficiency. The format's principal innovation is detecting and compactly expressing "tabular" data—arrays of uniform objects—without repetitive syntax overhead.

Core Design Principles

  • Token-First Optimization: Every design decision prioritizes reducing token count
  • Lossless Conversion: Perfect bidirectional conversion with JSON
  • Human Readability: Clean, scannable structure for developers
  • Machine Efficiency: Easy for LLMs to parse and understand
  • Minimal Syntax: Only essential punctuation and formatting

TOON format serves as a translation layer: use JSON in your application logic, then convert JSON to TOON when passing data to LLMs. This approach keeps your codebase maintainable while optimizing where it matters most—at the LLM API boundary.

TOON Format Syntax Breakdown

Understanding TOON syntax is key to leveraging its power. The format uses different encoding strategies based on data structure.

1. Tabular Arrays (Core Optimization)

The heart of TOON's efficiency is its tabular array format. When TOON detects an array of uniform objects, it converts them into a table-like structure:

users[3]{id,name,role}:
  1,Alice,admin
  2,Bob,user
  3,Charlie,user

Syntax Components:

  • users - Property name
  • [3] - Array length (number of items)
  • {id,name,role} - Field header (declared once)
  • : - Header terminator
  • Subsequent lines - Delimiter-separated values (one row per object)
⚠️ Tabular Array Requirements
For TOON to use tabular format, all array objects must have:
• Identical keys in the same order
• Primitive values only (no nested objects/arrays)
If these conditions aren't met, TOON falls back to list format.

2. List Arrays (Non-Uniform Data)

When array elements aren't uniform, TOON uses list format with the - prefix:

items[3]:
  - type: book
    title: 1984
  - type: movie
    title: Inception
  - type: game
    title: Chess

3. Primitive Arrays (Inline Format)

Arrays of primitives (strings, numbers, booleans) are encoded inline:

tags[4]: javascript,python,react,node

4. Nested Objects (Indentation-Based)

TOON uses indentation for nested objects, eliminating braces:

user:
  name: Alice
  profile:
    age: 30
    city: New York
    preferences:
      theme: dark
      notifications: true

5. Simple Key-Value Pairs

Basic properties use colon separation:

name: Alice
age: 30
active: true
score: 98.5

Tabular Array Detection: The Core Algorithm

The tabular array detection algorithm is where TOON achieves its massive token savings. This is the most important optimization in the entire TOON format specification.

Detection Logic

For an array to qualify as tabular, the TOON encoder performs these checks:

  1. Type Check: Verify all elements are objects (not primitives, arrays, or null)
  2. Key Uniformity: Extract keys from first object, then verify every other object has identical keys in the same order
  3. Value Type Check: Confirm all values are primitives (strings, numbers, booleans, null)

JavaScript Implementation Example

_isTabularArray(arr) {
  if (!Array.isArray(arr) || arr.length === 0) return false;

  const firstElement = arr[0];
  if (typeof firstElement !== 'object' || Array.isArray(firstElement)) {
    return false;
  }

  const firstKeys = Object.keys(firstElement).sort();

  for (const element of arr) {
    // Check all objects have identical keys
    const elementKeys = Object.keys(element).sort();
    if (!this._keysEqual(firstKeys, elementKeys)) return false;

    // Check all values are primitives
    for (const value of Object.values(element)) {
      if (typeof value === 'object' && value !== null) return false;
    }
  }

  return true;
}

Token Savings Explained

Consider a dataset with 100 user objects, each having 5 fields. In JSON:

  • Keys repeat 100 times: "id", "name", "email", "role", "active"
  • Curly braces: 200 characters (opening and closing for each object)
  • Commas and quotes: hundreds of additional characters

In TOON format:

  • Keys declared once: {id,name,email,role,active}
  • No braces for objects
  • Minimal quotes (only when necessary)
  • Result: 50-60% token reduction

Custom Delimiters in TOON Format

TOON format supports flexible delimiters to optimize for different data types and tokenizer behaviors.

Supported Delimiters

Comma (Default)

items[2]{id,name}:
  1,Alice
  2,Bob

Best for: Most general use cases

Tab Delimiter

items[2	]{id	name}:
  1	Alice
  2	Bob

Best for: Single-token efficiency with some tokenizers

Pipe Delimiter

items[2|]{id|name}:
  1|Alice
  2|Bob

Best for: Data containing many commas

Delimiter Configuration

Set delimiter when encoding:

// JavaScript
encode(data, { delimiter: '\t' })

# Python
encode(data, delimiter='\t')

Choosing the Right Delimiter

  • Use comma: Default choice, widely understood
  • Use tab: When you've tested that tabs tokenize more efficiently with your specific LLM
  • Use pipe: When your data contains many commas (addresses, descriptions, etc.)
đź’ˇ Tokenizer Testing
Different LLM tokenizers handle delimiters differently. Test with your specific model's tokenizer to find optimal delimiter. GPT models often treat tabs as single tokens.

TOON Format Quoting and Escape Rules

One of TOON's key optimizations is minimal quoting. Strings are only quoted when necessary to avoid ambiguity.

When TOON Adds Quotes

TOON quotes a string value if it:

  • Is empty: ""
  • Has leading or trailing whitespace: " text "
  • Contains the active delimiter: "Smith, John" (with comma delimiter)
  • Contains a colon: "key: value"
  • Contains quotes or backslashes: "He said \"hello\""
  • Looks like a boolean: "true", "false"
  • Looks like a number: "42", "3.14"
  • Looks like null: "null"
  • Starts with list syntax: "- item"
  • Looks like structural syntax: "[5]", "{key}"

Examples

// No quotes needed
name: Alice
city: NewYork
status: active

// Quotes required
name: "Smith, John"        // Contains comma
description: "Hello: World" // Contains colon
value: "123"               // Looks like number
flag: "true"               // Looks like boolean
empty: ""                  // Empty string
spaced: " text "           // Leading/trailing space

Escape Sequences

Within quoted strings, TOON uses standard escape sequences:

  • \\ - Backslash
  • \" - Double quote
  • \n - Newline
  • \r - Carriage return
  • \t - Tab
message: "Line 1\nLine 2"
quote: "She said \"hello\""
path: "C:\\Users\\Documents"

TOON Format Encoding Options

TOON encoders support various options to customize output format.

Available Options

Indentation (Default: 2 spaces)

// JavaScript
encode(data, { indent: 4 })

# Python
encode(data, indent=4)

Controls nesting indentation. Use 2 or 4 spaces for readability.

Delimiter (Default: comma)

encode(data, { delimiter: '\t' })  // Tab
encode(data, { delimiter: '|' })   // Pipe

Sets the delimiter for tabular arrays.

Length Marker (Default: false)

encode(data, { lengthMarker: '#' })

Output: items[#3]{id,name}:

Adds explicit marker before array count for emphasis.

Combined Options Example

const toon = encode(data, {
  indent: 2,
  delimiter: '\t',
  lengthMarker: '#'
});

// Output:
// users[#100	]{id	name	role}:
//   1	Alice	admin
//   2	Bob	user
//   ...

TOON Format Token Savings by Data Type

Token reduction varies significantly based on data structure. Here's what to expect:

Scenario Token Reduction Reason
Uniform tabular (100+ rows) 50-60% Keys declared once, CSV-like rows
Analytics time-series 35-60% Highly uniform structure
GitHub repositories 40-50% Moderate uniformity
Small datasets (<10 items) 20-30% Overhead from minimal syntax
Deeply nested objects 10-20% Indentation replaces braces
Mixed/non-uniform arrays 0-10% Falls back to list format
📊 Benchmark Methodology
Token counts measured using GPT-5 o200k_base tokenizer. Actual savings depend on your specific data structure and target LLM. Test with your real data using our converter tool.

TOON Format Implementation Guide

Implementing TOON encoding follows a systematic decision tree based on data type.

Encoder Architecture

function encode(value, options = {}) {
  const encoder = new ToonEncoder({
    indent: options.indent ?? 2,
    delimiter: options.delimiter ?? ',',
    lengthMarker: options.lengthMarker ?? false,
  });
  return encoder.encode(value);
}

class ToonEncoder {
  encode(value) {
    // 1. Type dispatch
    if (value === null) return 'null';
    if (typeof value !== 'object') return this._encodePrimitive(value);
    if (Array.isArray(value)) return this._encodeArray(value);
    return this._encodeObject(value);
  }

  _encodeArray(arr) {
    // 2. Check if tabular
    if (this._isTabularArray(arr)) {
      return this._encodeTabularArray(arr);
    }
    
    // 3. Check if all primitives
    if (arr.every(item => typeof item !== 'object')) {
      return this._encodePrimitiveArray(arr);
    }
    
    // 4. Fall back to list format
    return this._encodeListArray(arr);
  }
}

Type Normalization

TOON handles non-JSON types consistently:

  • Date objects → ISO string in quotes
  • BigInt → Decimal string (quoted if > safe integer)
  • NaN, Infinity → null
  • undefined → null
  • Functions, symbols → null

JavaScript Example

For a complete JavaScript implementation guide with Node.js and Express.js integration, see our TOON Format JavaScript Implementation Guide.

import { encode } from '@toon-format/toon';

const data = {
  users: [
    { id: 1, name: 'Alice', role: 'admin', active: true },
    { id: 2, name: 'Bob', role: 'user', active: false }
  ],
  metadata: {
    version: 1,
    created: '2025-01-01'
  }
};

const toon = encode(data);
console.log(toon);

// Output:
// users[2]{id,name,role,active}:
//   1,Alice,admin,true
//   2,Bob,user,false
// metadata:
//   version: 1
//   created: 2025-01-01

Python Example

from toon_format import encode

data = {
    'users': [
        {'id': 1, 'name': 'Alice', 'role': 'admin'},
        {'id': 2, 'name': 'Bob', 'role': 'user'}
    ],
    'count': 2
}

toon = encode(data)
print(toon)

TOON Format Use Cases

TOON format excels in specific scenarios where token efficiency matters most.

âś… Optimal Use Cases

LLM Prompt Engineering

Passing data context to LLMs while minimizing token costs. Perfect for:

  • User lists for analysis
  • Product catalogs for recommendations
  • Historical data for trend analysis
  • Configuration data for AI agents

API Response Optimization

Reducing bandwidth for data analytics APIs:

  • Time-series analytics (daily/hourly metrics)
  • Database query results
  • E-commerce order history
  • IoT sensor data streams

Embeddings & Feature Stores

Efficient serialization of high-dimensional arrays:

  • Feature vectors for ML models
  • Embedding databases
  • Vector similarity search results

Open Data Portals

Publishing large tabular datasets:

  • Government data releases
  • Research datasets
  • Public API documentation

❌ When Not to Use TOON

  • Deeply nested objects: 3+ levels of nesting (minimal benefit)
  • Non-uniform data: Objects with varying fields
  • Very small datasets: Less than 10 items (overhead not worth it)
  • Native tool integration: When tools require JSON specifically
  • Programmatic parsing: Use JSON for application logic

Frequently Asked Questions

How does TOON format differ from CSV?

TOON combines CSV's tabular efficiency with support for nested objects, mixed data types, and hierarchical structure. CSV can only represent flat tables.

Is TOON format standardized?

TOON is an open specification with reference implementations in JavaScript, Python, and other languages. The format is actively developed and community-driven.

Can I use TOON with any LLM?

Yes! TOON works with GPT, Claude, Llama, and all other LLMs. The format is tokenizer-agnostic, though specific savings vary by model.

How do I convert existing JSON to TOON?

Use our free JSON to TOON converter tool or install the TOON library for your language (npm: @toon-format/toon, pip: toon-format).

Does TOON support comments?

Currently, TOON does not support inline comments. This keeps the format minimal and parser implementation simple.

What about TOON file extensions?

TOON files typically use the .toon extension, though .txt works too. MIME type is text/x-toon (unofficial).

Conclusion: Mastering TOON Format

The TOON format represents a significant advancement in data serialization for the LLM era. By understanding tabular array detection, custom delimiters, quoting rules, and encoding options, you can leverage TOON to achieve 30-60% token reduction compared to JSON.

Key takeaways:

  • Tabular arrays are TOON's superpower—uniform data sees the biggest savings
  • Custom delimiters let you optimize for your specific tokenizer
  • Minimal quoting eliminates unnecessary characters
  • Type-based encoding ensures efficient representation of all data structures
  • Use as a translation layer—JSON in your app, TOON at the LLM boundary

Next Steps

Ready to start using TOON format? Here are your next steps:

Start Converting JSON to TOON

See instant token savings with our free online converter tool.