A system prompt is a hidden instruction for the model set by the developer. The user never sees it, but it governs all assistant behavior throughout the entire conversation.
Why You Need a System Prompt
Without a system prompt, Claude responds as a “general-purpose helpful assistant.” With one, it becomes a Python tutor, a strict code reviewer, a sarcastic character, a JSON parser, or a security specialist. The same question — different answers.
client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system="You are a Senior Python developer. Review code strictly. Find at least 3 issues.",
messages=[{"role": "user", "content": "Review my code: ..."}]
)
Good vs Bad System Prompts
❌ "Be a helpful assistant."
✓ "You are a Python tutor for beginners.
You explain concepts through everyday analogies.
You provide short code examples.
You only answer questions about Python.
If a question is off-topic — gently steer the conversation back."
Principles of a good system prompt:
1. Role — who you are, your experience and personality
2. Task — what you do and what you don’t do
3. Format — how to structure the response
4. Constraints — what falls outside your scope
temperature — How “Creative” the Response Is
temperature (0.0–1.0) controls the randomness of next-token selection:
client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
temperature=0.0, # deterministic mode
messages=[...]
)
| temperature | Use Case |
|---|---|
| 0.0 | Code, facts, JSON, structured output |
| 0.3–0.5 | Technical explanations |
| 0.7–0.9 | Creative tasks, brainstorming |
| 1.0 | Maximum variability |
With temperature=0.0, the same question produces nearly the same answer every time. With temperature=1.0 — a different variation each time.
max_tokens — a Ceiling, Not a Target
max_tokens is a hard maximum, not a target length. Setting max_tokens=100 will stop the model at exactly 100 tokens — mid-sentence if necessary.
# Always check stop_reason:
if message.stop_reason == "max_tokens":
# the response was cut off — increase the limit
pass
elif message.stop_reason == "end_turn":
# the model finished on its own
pass
Rule of thumb: set max_tokens to 2–3x the expected response length.
System Prompts and Tokens
The system prompt is counted in input_tokens for every request. A long system prompt (500 tokens) × 1,000 requests = 500,000 extra input tokens.
To reduce costs, use prompt caching — Anthropic caches the system prompt server-side:
client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system=[{
"type": "text",
"text": "Very long system prompt...",
"cache_control": {"type": "ephemeral"}
}],
messages=[...]
)
On a cache hit, the cost of cached tokens drops by ~90%.
The Role System
In the Messages API there are three levels of instructions:
- system — sets the role and behavior (developer-controlled)
- user — messages from the user
- assistant — model responses (can be used for few-shot examples)
messages = [
# Few-shot: demonstrate the desired response format
{"role": "user", "content": "2+2"},
{"role": "assistant", "content": "4"},
{"role": "user", "content": "What is a decorator?"},
]
Roles must strictly alternate: user → assistant → user → …
💬 Comments (0)
No comments yet
Be the first to share your opinion about this article!