📝 LLM & AI

Tokens in LLMs: How the Model Sees Text

Author

04e5cc8b-58ac-4bdc-bdee-661bbb

📅

Published

04.06.2026

⏱️

Reading time

2 min

👁️

Views

🌱

Level

Beginner

When you send a request to a language model, it doesn’t see words or letters. It sees tokens — chunks of text roughly a few characters each. Understanding tokens will help you save money and write better prompts.

What Is a Token

A token is the unit of text a model works with. It’s neither a character nor a word — something in between. The model is trained to predict the next token based on all previous ones. That’s how it “generates” a response — one token at a time.

Rough rules:
- 1 English word ≈ 1–1.5 tokens
- 1 non-Latin word ≈ 2–3 tokens (non-Latin scripts are more expensive)
- 1 token ≈ 4 characters in English
- Code with indentation = more tokens than it looks

Examples:
- Hello → 1 token
- Привет → 2 tokens
- Hello, World! → 4 tokens
- 100 lines of Python code → ~500–800 tokens

Why Non-English Text Costs More

Models are trained predominantly on English text, so English words more often map cleanly to single vocabulary entries. Words in other languages are frequently split into multiple pieces. Practical implication: a system prompt in English is typically 1.5–2× cheaper than one in another language.

Input and Output Tokens

Anthropic charges by tokens:

Input tokens — everything you sent: system prompt + full conversation history + the new message
Output tokens — everything the model responded with

# After each request:
print(response.usage.input_tokens)   # tokens sent
print(response.usage.output_tokens)  # tokens in the response

A typical chat exchange is roughly 200–500 input + 300–800 output tokens. At thousands of requests, the numbers add up.

Tokens and Money

Anthropic pricing (2026):

Model	Input	Output
claude-haiku-4-5	$0.80 / 1M	$4 / 1M
claude-sonnet-4-6	$3 / 1M	$15 / 1M
claude-opus-4-7	$15 / 1M	$75 / 1M

A typical Sonnet request costs roughly $0.005–0.01. A thousand requests ≈ $5–10.

Context Window

Every model has a context window — the maximum number of tokens in a single request (input + output combined). For claude-sonnet-4-6 that’s 200,000 tokens — about 150,000 words, or three novels.

As conversation history grows, it fills the context window. Strategies:
- Truncate old messages (keep only the last N)
- Summarize history via a separate request
- Store only key facts rather than verbatim exchanges

Counting Tokens in Advance

The SDK lets you count tokens without sending a request:

response = client.messages.count_tokens(
    model="claude-sonnet-4-6",
    system="You are a Python tutor.",
    messages=[{"role": "user", "content": "What is a decorator?"}]
)
print(response.input_tokens)  # exact token count

Use this to verify a prompt won’t exceed limits before sending it.

LLMs Are Predictors, Not Knowledge Bases

The key insight: an LLM doesn’t look up answers in a database. It predicts the most likely next token. That’s why:
- The model can “hallucinate” — generating plausible but incorrect facts
- temperature=0 produces more stable answers — less randomness in token selection
- The same question at temperature=1 gives different answers each time
- Longer context = the model “sees” more = better understanding of the task

Similar articles

Anthropic SDK: Getting Started with the Claude API
📅 04.06.2026 👁️ 15

Streaming LLM Responses: Getting the Answer Piece…
📅 04.06.2026 👁️ 17

uv: The Modern Python Package Manager
📅 04.06.2026 👁️ 15

Tokens in LLMs: How the Model Sees Text

What Is a Token

Why Non-English Text Costs More

Input and Output Tokens

Tokens and Money

Context Window

Counting Tokens in Advance

LLMs Are Predictors, Not Knowledge Bases

Your reaction to the article

The datetime Module: Working with Dates and Times

rich: Beautiful Terminal Output

💬 Comments (0)

No comments yet

Similar articles

Anthropic SDK: Getting Started with the Claude API

Streaming LLM Responses: Getting the Answer Piece…

uv: The Modern Python Package Manager

Did you like the article?