Anthropic Effort Parameter

Control how many tokens Claude uses when responding with the effort parameter, trading off between response thoroughness and token efficiency.

Overview

The effort parameter allows you to control how eager Claude is about spending tokens when responding to requests. This gives you the ability to trade off between response thoroughness and token efficiency, all with a single model.

Supported models:

Claude 4.6 (Opus 4.6, Sonnet 4.6) — output_config is a stable API feature, no beta header needed. Opus 4.6 also supports effort="max".
Claude Opus 4.5 — requires the effort-2025-11-24 beta header (automatically added by LiteLLM).

LiteLLM automatically maps reasoning_effort → output_config={"effort": ...} for all supported models.

How Effort Works

By default, Claude uses maximum effort—spending as many tokens as needed for the best possible outcome. By lowering the effort level, you can instruct Claude to be more conservative with token usage, optimizing for speed and cost while accepting some reduction in capability.

Tip: Setting effort to "high" produces exactly the same behavior as omitting the effort parameter entirely.

The effort parameter affects all tokens in the response, including:

Text responses and explanations
Tool calls and function arguments
Extended thinking (when enabled)

This approach has two major advantages:

It doesn't require thinking to be enabled in order to use it.
It can affect all token spend including tool calls. For example, lower effort would mean Claude makes fewer tool calls.

This gives a much greater degree of control over efficiency.

Effort Levels

Level	Description	Typical use case
`max`	Maximum capability beyond high — Claude uses even more tokens for the most thorough outcome. Only supported by Claude Opus 4.6.	The hardest reasoning problems, complex multi-step research
`high`	Maximum capability—Claude uses as many tokens as needed for the best possible outcome. Equivalent to not setting the parameter.	Complex reasoning, difficult coding problems, agentic tasks
`medium`	Balanced approach with moderate token savings.	Agentic tasks that require a balance of speed, cost, and performance
`low`	Most efficient—significant token savings with some capability reduction.	Simpler tasks that need the best speed and lowest costs, such as subagents

Quick Start

Using LiteLLM SDK

Python
TypeScript

import litellm

# Works with Claude 4.6 models (no beta header needed)
response = litellm.completion(
    model="anthropic/claude-sonnet-4-6",
    messages=[{
        "role": "user",
        "content": "Analyze the trade-offs between microservices and monolithic architectures"
    }],
    reasoning_effort="medium"  # Automatically mapped to output_config
)

print(response.choices[0].message.content)

# Also works with Claude Opus 4.5 (beta header auto-injected)
response = litellm.completion(
    model="anthropic/claude-opus-4-5-20251101",
    messages=[{
        "role": "user",
        "content": "Analyze the trade-offs between microservices and monolithic architectures"
    }],
    reasoning_effort="medium"
)

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
});

// Claude 4.6 — output_config is a stable API feature (no beta header)
const response = await client.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 4096,
  messages: [{
    role: "user",
    content: "Analyze the trade-offs between microservices and monolithic architectures"
  }],
  output_config: {
    effort: "medium"
  }
});

console.log(response.content[0].text);

Using LiteLLM Proxy

curl http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $LITELLM_API_KEY" \
  -d '{
    "model": "anthropic/claude-sonnet-4-6",
    "messages": [{
      "role": "user",
      "content": "Analyze the trade-offs between microservices and monolithic architectures"
    }],
    "reasoning_effort": "medium"
  }'

Direct Anthropic API Call

Claude 4.6 (stable)
Claude Opus 4.5 (beta)

# Claude 4.6 — no beta header needed
curl https://api.anthropic.com/v1/messages \
  --header "x-api-key: $ANTHROPIC_API_KEY" \
  --header "anthropic-version: 2023-06-01" \
  --header "content-type: application/json" \
  --data '{
    "model": "claude-sonnet-4-6",
    "max_tokens": 4096,
    "messages": [{
      "role": "user",
      "content": "Analyze the trade-offs between microservices and monolithic architectures"
    }],
    "output_config": {
      "effort": "medium"
    }
  }'

# Claude Opus 4.5 — requires beta header
curl https://api.anthropic.com/v1/messages \
  --header "x-api-key: $ANTHROPIC_API_KEY" \
  --header "anthropic-version: 2023-06-01" \
  --header "anthropic-beta: effort-2025-11-24" \
  --header "content-type: application/json" \
  --data '{
    "model": "claude-opus-4-5-20251101",
    "max_tokens": 4096,
    "messages": [{
      "role": "user",
      "content": "Analyze the trade-offs between microservices and monolithic architectures"
    }],
    "output_config": {
      "effort": "medium"
    }
  }'

Model Compatibility

The effort parameter is supported by:

Claude Opus 4.6 (claude-opus-4-6) — supports high, medium, low, and max
Claude Sonnet 4.6 (claude-sonnet-4-6) — supports high, medium, low
Claude Opus 4.5 (claude-opus-4-5-20251101) — supports high, medium, low

info

effort="max" is only available on Claude Opus 4.6. Using it with other models will raise a validation error.

When Should I Adjust the Effort Parameter?

Use high effort (the default) when you need Claude's best work—complex reasoning, nuanced analysis, difficult coding problems, or any task where quality is the top priority.
Use medium effort as a balanced option when you want solid performance without the full token expenditure of high effort.
Use low effort when you're optimizing for speed (because Claude answers with fewer tokens) or cost—for example, simple classification tasks, quick lookups, or high-volume use cases where marginal quality improvements don't justify additional latency or spend.

Effort with Tool Use

When using tools, the effort parameter affects both the explanations around tool calls and the tool calls themselves. Lower effort levels tend to:

Combine multiple operations into fewer tool calls
Make fewer tool calls
Proceed directly to action

Example with tools:

import litellm

response = litellm.completion(
    model="anthropic/claude-sonnet-4-6",
    messages=[{
        "role": "user",
        "content": "Check the weather in multiple cities"
    }],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"}
                },
                "required": ["location"]
            }
        }
    }],
    reasoning_effort="low"  # Mapped to output_config — will make fewer tool calls
)

Effort with Extended Thinking

The effort parameter works seamlessly with extended thinking. When both are enabled, effort controls the token budget across all response types:

import litellm

response = litellm.completion(
    model="anthropic/claude-sonnet-4-6",
    messages=[{
        "role": "user",
        "content": "Solve this complex problem"
    }],
    reasoning_effort="medium"  # Mapped to adaptive thinking + output_config for 4.6 models
)

Best Practices

Start with the default (high) for new tasks, then experiment with lower effort levels if you're looking to optimize costs.
Use medium effort for production agentic workflows where you need a balance of quality and efficiency.
Reserve low effort for high-volume, simple tasks like classification, routing, or data extraction where speed matters more than nuanced responses.
Monitor token usage to understand the actual savings from different effort levels for your specific use cases.
Test with your specific prompts as the impact of effort levels can vary based on task complexity.

Provider Support

The effort parameter is supported across all Anthropic-compatible providers:

Standard Anthropic API: ✅ Supported (Claude 4.6, Opus 4.5)
Azure Anthropic / Microsoft Foundry: ✅ Supported (Claude 4.6, Opus 4.5)
Amazon Bedrock: ✅ Supported (Claude 4.6, Opus 4.5)
Google Cloud Vertex AI: ✅ Supported (Claude 4.6, Opus 4.5)

LiteLLM automatically handles:

Parameter mapping: reasoning_effort → output_config={"effort": ...} for all supported models
Beta header injection (effort-2025-11-24) only for Claude Opus 4.5 (not needed for 4.6 models)

Usage and Pricing

Token usage with different effort levels is tracked in the standard usage object. Lower effort levels result in fewer output tokens, which directly reduces costs:

response = litellm.completion(
    model="anthropic/claude-opus-4-5-20251101",
    messages=[{"role": "user", "content": "Analyze this"}],
    output_config={"effort": "low"}
)

print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")

Troubleshooting

Beta header not being added (Claude Opus 4.5)

LiteLLM automatically adds the effort-2025-11-24 beta header for Claude Opus 4.5 when reasoning_effort or output_config is provided.

Note: Claude 4.6 models do NOT need a beta header — output_config is a stable API feature for these models.

If you're not seeing the header for Opus 4.5:

Ensure you're using reasoning_effort parameter
Verify the model is Claude Opus 4.5
Check that LiteLLM version supports this feature

Invalid effort value error

Accepted values: "high", "medium", "low", and "max" (Opus 4.6 only). Any other value will raise a validation error:

# ❌ This will raise an error
output_config={"effort": "very_low"}

# ✅ Use one of the valid values
output_config={"effort": "low"}

# ❌ This will raise an error (max only works on Opus 4.6)
litellm.completion(model="anthropic/claude-sonnet-4-6", reasoning_effort="max", ...)

# ✅ max is only for Opus 4.6
litellm.completion(model="anthropic/claude-opus-4-6", reasoning_effort="max", ...)

Model not supported

The effort parameter is supported by Claude Opus 4.6, Sonnet 4.6, and Opus 4.5. Using it with other models may result in the parameter being ignored or an error.

Extended Thinking - Control Claude's reasoning process
Tool Use - Enable Claude to use tools and functions
Programmatic Tool Calling - Let Claude write code that calls tools
Prompt Caching - Cache prompts to reduce costs

Overview​

How Effort Works​

Effort Levels​

Quick Start​

Using LiteLLM SDK​

Using LiteLLM Proxy​

Direct Anthropic API Call​

Model Compatibility​

When Should I Adjust the Effort Parameter?​

Effort with Tool Use​

Effort with Extended Thinking​

Best Practices​

Provider Support​

Usage and Pricing​

Troubleshooting​

Beta header not being added (Claude Opus 4.5)​

Invalid effort value error​

Model not supported​

Related Features​

Additional Resources​