📝 LLM & AI

System Prompts: Defining the Model's Role

0
Author
04e5cc8b-58ac-4bdc-bdee-661bbb
📅
Published
04.06.2026
⏱️
Reading time
2 min
👁️
Views
14
🌱
Level
Beginner

A system prompt is a hidden instruction for the model set by the developer. The user never sees it, but it governs all assistant behavior throughout the entire conversation.

Why You Need a System Prompt

Without a system prompt, Claude responds as a “general-purpose helpful assistant.” With one, it becomes a Python tutor, a strict code reviewer, a sarcastic character, a JSON parser, or a security specialist. The same question — different answers.

client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system="You are a Senior Python developer. Review code strictly. Find at least 3 issues.",
    messages=[{"role": "user", "content": "Review my code: ..."}]
)

Good vs Bad System Prompts

❌ "Be a helpful assistant."

✓ "You are a Python tutor for beginners.
   You explain concepts through everyday analogies.
   You provide short code examples.
   You only answer questions about Python.
   If a question is off-topic — gently steer the conversation back."

Principles of a good system prompt:
1. Role — who you are, your experience and personality
2. Task — what you do and what you don’t do
3. Format — how to structure the response
4. Constraints — what falls outside your scope

temperature — How “Creative” the Response Is

temperature (0.0–1.0) controls the randomness of next-token selection:

client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    temperature=0.0,     # deterministic mode
    messages=[...]
)
temperature Use Case
0.0 Code, facts, JSON, structured output
0.3–0.5 Technical explanations
0.7–0.9 Creative tasks, brainstorming
1.0 Maximum variability

With temperature=0.0, the same question produces nearly the same answer every time. With temperature=1.0 — a different variation each time.

max_tokens — a Ceiling, Not a Target

max_tokens is a hard maximum, not a target length. Setting max_tokens=100 will stop the model at exactly 100 tokens — mid-sentence if necessary.

# Always check stop_reason:
if message.stop_reason == "max_tokens":
    # the response was cut off — increase the limit
    pass
elif message.stop_reason == "end_turn":
    # the model finished on its own
    pass

Rule of thumb: set max_tokens to 2–3x the expected response length.

System Prompts and Tokens

The system prompt is counted in input_tokens for every request. A long system prompt (500 tokens) × 1,000 requests = 500,000 extra input tokens.

To reduce costs, use prompt caching — Anthropic caches the system prompt server-side:

client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system=[{
        "type": "text",
        "text": "Very long system prompt...",
        "cache_control": {"type": "ephemeral"}
    }],
    messages=[...]
)

On a cache hit, the cost of cached tokens drops by ~90%.

The Role System

In the Messages API there are three levels of instructions:

  1. system — sets the role and behavior (developer-controlled)
  2. user — messages from the user
  3. assistant — model responses (can be used for few-shot examples)
messages = [
    # Few-shot: demonstrate the desired response format
    {"role": "user", "content": "2+2"},
    {"role": "assistant", "content": "4"},
    {"role": "user", "content": "What is a decorator?"},
]

Roles must strictly alternate: user → assistant → user → …

Your reaction to the article

💬 Comments (0)

🔐 Sign in to leave a comment
🚪 Login
💭

No comments yet

Be the first to share your opinion about this article!

🔗 Similar

Similar articles

Continue learning with these materials

📝

Anthropic SDK: Getting Started with the Claude API

Anthropic Python SDK is the official library for working with Claude. It hides the complexity...

📅 04.06.2026 👁️ 16
📝

Streaming LLM Responses: Getting the Answer Piece…

By default, messages.create() waits until the model has fully generated its response before returning anything....

📅 04.06.2026 👁️ 18
📝

uv: The Modern Python Package Manager

uv is a next-generation tool for managing Python dependencies. Written in Rust by Astral, it...

📅 04.06.2026 👁️ 16

Did you like the article?

Subscribe to our updates and receive new articles first. Grow with PyLand!