How many tokens is a word?

For typical English, a token is about three quarters of a word, so 1,000 tokens is roughly 750 words or 4,000 characters. The ratio is worse for code, JSON, punctuation-heavy text, and non-English languages, which split into more tokens per word.

Why are tokens priced separately for input and output?

Generating output runs the model forward one token at a time, which is more compute-intensive than reading the input prompt, so providers charge several times more per output token. That is why capping response length and reusing context save more than trimming the prompt alone.

Tokens (LLM)

A token is the unit a language model reads and writes: a chunk of text (often a word piece) produced by the model's tokenizer. Pricing, context limits, and speed are all measured in tokens, not words or characters.

also: token · LLM tokens · tokenization · BPE

English rule of thumb≈ 0.75 words / token (1k tokens ≈ 750 words)code and JSON tokenize worse

Tokenizers split text into sub-word pieces using a scheme like byte-pair encoding, so common words are one token and rarer words break into several. A useful rule of thumb for English is about 4 characters or 0.75 words per token, which makes 1,000 tokens roughly 750 words, but code, JSON, and non-English text tokenize less efficiently and cost more per word.

Everything that matters commercially is denominated in tokens. You pay per million input and output tokens (output is usually several times more expensive), the context window is a token count, and latency scales with tokens generated. Because you are billed for both what you send and what you get back, trimming prompts, caching repeated context, and capping output length are the direct cost levers.

free_toolAI Prompt Token & Cost InspectorCount a prompt's tokens with a real tokenizer and price it across Claude, GPT-4o, and Gemini.

related_terms

faq

Questions & answers

How many tokens is a word?: For typical English, a token is about three quarters of a word, so 1,000 tokens is roughly 750 words or 4,000 characters. The ratio is worse for code, JSON, punctuation-heavy text, and non-English languages, which split into more tokens per word.
Why are tokens priced separately for input and output?: Generating output runs the model forward one token at a time, which is more compute-intensive than reading the input prompt, so providers charge several times more per output token. That is why capping response length and reusing context save more than trimming the prompt alone.

Want this applied to your stack, not just defined?

The free tools run the numbers; an audit tells you where the real cost and risk are. Book a call, or leave your email and I'll reach out.

Book a call

Prefer proof first? See how this plays out in real case studies →