Token

Simple Definition

A token is the basic unit of text that AI language models process. When you send a message to an AI, it doesn’t read word by word — it breaks your text into tokens first.

Tokens are roughly chunks of 3–4 characters. Common words are often one token; rarer words may be split into multiple tokens.

How Tokenization Works

The text “ChatGPT is great” might be split into tokens like:

  • “Chat” + “G” + “PT” + ” is” + ” great”

Or possibly: “ChatGPT” + ” is” + ” great”

Tokenization depends on the model’s vocabulary and the specific tokenizer used.

Why Tokens Matter for Users

1. Context window limits — AI models can only process a certain number of tokens at once. Long conversations or documents need to fit within this limit.

2. API pricing — AI APIs charge by the token (both input and output tokens). Understanding token counts helps you estimate costs.

3. Response length — when you ask an AI to “keep it under 200 words,” you’re really controlling token count indirectly.

Rough Token Estimates

TextApproximate Tokens
1 word~1.3 tokens
1 sentence~15–20 tokens
1 paragraph~60–80 tokens
1 page~500–600 tokens
1 book (300 pages)~150,000 tokens

Input vs. Output Tokens

  • Input tokens — everything you send to the model (your prompt, conversation history, documents)
  • Output tokens — the response the model generates

APIs typically charge for both, with output tokens often costing more.

  • Context Window — the maximum number of tokens a model can handle at once
  • LLM — language models that process text as tokens
  • Inference — the process where tokens are generated
  • Temperature — the setting that affects token selection during generation

See AI terms in action

Browse practical AI workflows that use the concepts in this glossary.

Frequently Asked Questions

How many tokens is 1000 words?

Roughly 1,300–1,500 tokens. A common rule of thumb is that 1 token ≈ 0.75 words in English, so 1,000 words ≈ 1,333 tokens.

Last updated: