Anthropic Effort Parameter
Control how many tokens Claude uses when responding with the effort parameter, trading off between response thoroughness and token efficiency.
Overview​
The effort parameter allows you to control how eager Claude is about spending tokens when responding to requests. This gives you the ability to trade off between response thoroughness and token efficiency, all with a single model.
Supported models:
- Claude 4.6 (Opus 4.6, Sonnet 4.6) —
output_configis a stable API feature, no beta header needed. Opus 4.6 also supportseffort="max". - Claude Opus 4.5 — requires the
effort-2025-11-24beta header (automatically added by LiteLLM).
LiteLLM automatically maps reasoning_effort → output_config={"effort": ...} for all supported models.
How Effort Works​
By default, Claude uses maximum effort—spending as many tokens as needed for the best possible outcome. By lowering the effort level, you can instruct Claude to be more conservative with token usage, optimizing for speed and cost while accepting some reduction in capability.
Tip: Setting effort to "high" produces exactly the same behavior as omitting the effort parameter entirely.
The effort parameter affects all tokens in the response, including:
- Text responses and explanations
- Tool calls and function arguments
- Extended thinking (when enabled)
This approach has two major advantages:
- It doesn't require thinking to be enabled in order to use it.
- It can affect all token spend including tool calls. For example, lower effort would mean Claude makes fewer tool calls.
This gives a much greater degree of control over efficiency.
Effort Levels​
| Level | Description | Typical use case |
|---|---|---|
max | Maximum capability beyond high — Claude uses even more tokens for the most thorough outcome. Only supported by Claude Opus 4.6. | The hardest reasoning problems, complex multi-step research |
high | Maximum capability—Claude uses as many tokens as needed for the best possible outcome. Equivalent to not setting the parameter. | Complex reasoning, difficult coding problems, agentic tasks |
medium | Balanced approach with moderate token savings. | Agentic tasks that require a balance of speed, cost, and performance |
low | Most efficient—significant token savings with some capability reduction. | Simpler tasks that need the best speed and lowest costs, such as subagents |
Quick Start​
Using LiteLLM SDK​
- Python
- TypeScript
import litellm
# Works with Claude 4.6 models (no beta header needed)
response = litellm.completion(
model="anthropic/claude-sonnet-4-6",
messages=[{
"role": "user",
"content": "Analyze the trade-offs between microservices and monolithic architectures"
}],
reasoning_effort="medium" # Automatically mapped to output_config
)
print(response.choices[0].message.content)
# Also works with Claude Opus 4.5 (beta header auto-injected)
response = litellm.completion(
model="anthropic/claude-opus-4-5-20251101",
messages=[{
"role": "user",
"content": "Analyze the trade-offs between microservices and monolithic architectures"
}],
reasoning_effort="medium"
)
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY,
});
// Claude 4.6 — output_config is a stable API feature (no beta header)
const response = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 4096,
messages: [{
role: "user",
content: "Analyze the trade-offs between microservices and monolithic architectures"
}],
output_config: {
effort: "medium"
}
});
console.log(response.content[0].text);
Using LiteLLM Proxy​
curl http://localhost:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $LITELLM_API_KEY" \
-d '{
"model": "anthropic/claude-sonnet-4-6",
"messages": [{
"role": "user",
"content": "Analyze the trade-offs between microservices and monolithic architectures"
}],
"reasoning_effort": "medium"
}'
Direct Anthropic API Call​
- Claude 4.6 (stable)
- Claude Opus 4.5 (beta)
# Claude 4.6 — no beta header needed
curl https://api.anthropic.com/v1/messages \
--header "x-api-key: $ANTHROPIC_API_KEY" \
--header "anthropic-version: 2023-06-01" \
--header "content-type: application/json" \
--data '{
"model": "claude-sonnet-4-6",
"max_tokens": 4096,
"messages": [{
"role": "user",
"content": "Analyze the trade-offs between microservices and monolithic architectures"
}],
"output_config": {
"effort": "medium"
}
}'
# Claude Opus 4.5 — requires beta header
curl https://api.anthropic.com/v1/messages \
--header "x-api-key: $ANTHROPIC_API_KEY" \
--header "anthropic-version: 2023-06-01" \
--header "anthropic-beta: effort-2025-11-24" \
--header "content-type: application/json" \
--data '{
"model": "claude-opus-4-5-20251101",
"max_tokens": 4096,
"messages": [{
"role": "user",
"content": "Analyze the trade-offs between microservices and monolithic architectures"
}],
"output_config": {
"effort": "medium"
}
}'
Model Compatibility​
The effort parameter is supported by:
- Claude Opus 4.6 (
claude-opus-4-6) — supportshigh,medium,low, andmax - Claude Sonnet 4.6 (
claude-sonnet-4-6) — supportshigh,medium,low - Claude Opus 4.5 (
claude-opus-4-5-20251101) — supportshigh,medium,low
effort="max" is only available on Claude Opus 4.6. Using it with other models will raise a validation error.
When Should I Adjust the Effort Parameter?​
-
Use high effort (the default) when you need Claude's best work—complex reasoning, nuanced analysis, difficult coding problems, or any task where quality is the top priority.
-
Use medium effort as a balanced option when you want solid performance without the full token expenditure of high effort.
-
Use low effort when you're optimizing for speed (because Claude answers with fewer tokens) or cost—for example, simple classification tasks, quick lookups, or high-volume use cases where marginal quality improvements don't justify additional latency or spend.
Effort with Tool Use​
When using tools, the effort parameter affects both the explanations around tool calls and the tool calls themselves. Lower effort levels tend to:
- Combine multiple operations into fewer tool calls
- Make fewer tool calls
- Proceed directly to action
Example with tools:
import litellm
response = litellm.completion(
model="anthropic/claude-sonnet-4-6",
messages=[{
"role": "user",
"content": "Check the weather in multiple cities"
}],
tools=[{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"}
},
"required": ["location"]
}
}
}],
reasoning_effort="low" # Mapped to output_config — will make fewer tool calls
)
Effort with Extended Thinking​
The effort parameter works seamlessly with extended thinking. When both are enabled, effort controls the token budget across all response types:
import litellm
response = litellm.completion(
model="anthropic/claude-sonnet-4-6",
messages=[{
"role": "user",
"content": "Solve this complex problem"
}],
reasoning_effort="medium" # Mapped to adaptive thinking + output_config for 4.6 models
)
Best Practices​
-
Start with the default (high) for new tasks, then experiment with lower effort levels if you're looking to optimize costs.
-
Use medium effort for production agentic workflows where you need a balance of quality and efficiency.
-
Reserve low effort for high-volume, simple tasks like classification, routing, or data extraction where speed matters more than nuanced responses.
-
Monitor token usage to understand the actual savings from different effort levels for your specific use cases.
-
Test with your specific prompts as the impact of effort levels can vary based on task complexity.
Provider Support​
The effort parameter is supported across all Anthropic-compatible providers:
- Standard Anthropic API: âś… Supported (Claude 4.6, Opus 4.5)
- Azure Anthropic / Microsoft Foundry: âś… Supported (Claude 4.6, Opus 4.5)
- Amazon Bedrock: âś… Supported (Claude 4.6, Opus 4.5)
- Google Cloud Vertex AI: âś… Supported (Claude 4.6, Opus 4.5)
LiteLLM automatically handles:
- Parameter mapping:
reasoning_effort→output_config={"effort": ...}for all supported models - Beta header injection (
effort-2025-11-24) only for Claude Opus 4.5 (not needed for 4.6 models)
Usage and Pricing​
Token usage with different effort levels is tracked in the standard usage object. Lower effort levels result in fewer output tokens, which directly reduces costs:
response = litellm.completion(
model="anthropic/claude-opus-4-5-20251101",
messages=[{"role": "user", "content": "Analyze this"}],
output_config={"effort": "low"}
)
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
Troubleshooting​
Beta header not being added (Claude Opus 4.5)​
LiteLLM automatically adds the effort-2025-11-24 beta header for Claude Opus 4.5 when reasoning_effort or output_config is provided.
Note: Claude 4.6 models do NOT need a beta header — output_config is a stable API feature for these models.
If you're not seeing the header for Opus 4.5:
- Ensure you're using
reasoning_effortparameter - Verify the model is Claude Opus 4.5
- Check that LiteLLM version supports this feature
Invalid effort value error​
Accepted values: "high", "medium", "low", and "max" (Opus 4.6 only). Any other value will raise a validation error:
# ❌ This will raise an error
output_config={"effort": "very_low"}
# âś… Use one of the valid values
output_config={"effort": "low"}
# ❌ This will raise an error (max only works on Opus 4.6)
litellm.completion(model="anthropic/claude-sonnet-4-6", reasoning_effort="max", ...)
# âś… max is only for Opus 4.6
litellm.completion(model="anthropic/claude-opus-4-6", reasoning_effort="max", ...)
Model not supported​
The effort parameter is supported by Claude Opus 4.6, Sonnet 4.6, and Opus 4.5. Using it with other models may result in the parameter being ignored or an error.
Related Features​
- Extended Thinking - Control Claude's reasoning process
- Tool Use - Enable Claude to use tools and functions
- Programmatic Tool Calling - Let Claude write code that calls tools
- Prompt Caching - Cache prompts to reduce costs