📝 LLM & AI

Tokens in LLMs: How the Model Sees Text

0
Author
04e5cc8b-58ac-4bdc-bdee-661bbb
📅
Published
04.06.2026
⏱️
Reading time
2 min
👁️
Views
16
🌱
Level
Beginner

When you send a request to a language model, it doesn’t see words or letters. It sees tokens — chunks of text roughly a few characters each. Understanding tokens will help you save money and write better prompts.

What Is a Token

A token is the unit of text a model works with. It’s neither a character nor a word — something in between. The model is trained to predict the next token based on all previous ones. That’s how it “generates” a response — one token at a time.

Rough rules:
- 1 English word ≈ 1–1.5 tokens
- 1 non-Latin word ≈ 2–3 tokens (non-Latin scripts are more expensive)
- 1 token ≈ 4 characters in English
- Code with indentation = more tokens than it looks

Examples:
- Hello → 1 token
- Привет → 2 tokens
- Hello, World! → 4 tokens
- 100 lines of Python code → ~500–800 tokens

Why Non-English Text Costs More

Models are trained predominantly on English text, so English words more often map cleanly to single vocabulary entries. Words in other languages are frequently split into multiple pieces. Practical implication: a system prompt in English is typically 1.5–2× cheaper than one in another language.

Input and Output Tokens

Anthropic charges by tokens:

  • Input tokens — everything you sent: system prompt + full conversation history + the new message
  • Output tokens — everything the model responded with
# After each request:
print(response.usage.input_tokens)   # tokens sent
print(response.usage.output_tokens)  # tokens in the response

A typical chat exchange is roughly 200–500 input + 300–800 output tokens. At thousands of requests, the numbers add up.

Tokens and Money

Anthropic pricing (2026):

Model Input Output
claude-haiku-4-5 $0.80 / 1M $4 / 1M
claude-sonnet-4-6 $3 / 1M $15 / 1M
claude-opus-4-7 $15 / 1M $75 / 1M

A typical Sonnet request costs roughly $0.005–0.01. A thousand requests ≈ $5–10.

Context Window

Every model has a context window — the maximum number of tokens in a single request (input + output combined). For claude-sonnet-4-6 that’s 200,000 tokens — about 150,000 words, or three novels.

As conversation history grows, it fills the context window. Strategies:
- Truncate old messages (keep only the last N)
- Summarize history via a separate request
- Store only key facts rather than verbatim exchanges

Counting Tokens in Advance

The SDK lets you count tokens without sending a request:

response = client.messages.count_tokens(
    model="claude-sonnet-4-6",
    system="You are a Python tutor.",
    messages=[{"role": "user", "content": "What is a decorator?"}]
)
print(response.input_tokens)  # exact token count

Use this to verify a prompt won’t exceed limits before sending it.

LLMs Are Predictors, Not Knowledge Bases

The key insight: an LLM doesn’t look up answers in a database. It predicts the most likely next token. That’s why:
- The model can “hallucinate” — generating plausible but incorrect facts
- temperature=0 produces more stable answers — less randomness in token selection
- The same question at temperature=1 gives different answers each time
- Longer context = the model “sees” more = better understanding of the task

Your reaction to the article

💬 Comments (0)

🔐 Sign in to leave a comment
🚪 Login
💭

No comments yet

Be the first to share your opinion about this article!

🔗 Similar

Similar articles

Continue learning with these materials

📝

Anthropic SDK: Getting Started with the Claude API

Anthropic Python SDK is the official library for working with Claude. It hides the complexity...

📅 04.06.2026 👁️ 15
📝

Streaming LLM Responses: Getting the Answer Piece…

By default, messages.create() waits until the model has fully generated its response before returning anything....

📅 04.06.2026 👁️ 17
📝

uv: The Modern Python Package Manager

uv is a next-generation tool for managing Python dependencies. Written in Rust by Astral, it...

📅 04.06.2026 👁️ 15

Did you like the article?

Subscribe to our updates and receive new articles first. Grow with PyLand!