Getting a 529 overloaded_error from the Claude API means Anthropic's servers are temporarily at capacity — not that you've hit your rate limit. The fix is different from a 429 error, and most developers make the mistake of applying the wrong solution. Here's exactly what to do.
What Is Claude API Error 529?
HTTP 529 with error type overloaded_error is Anthropic's way of telling you the API is temporarily overloaded across all users. This is a server-side capacity issue, not an account-level rate limit.
The full error response looks like this:
{
"type": "error",
"error": {
"type": "overloaded_error",
"message": "Overloaded"
},
"request_id": "req_011CSHoEeqs5C35K2UUqR7Fy"
}
The key distinction from other Claude API errors:
Why You're Getting 529 Right Now
529 errors spike during three situations:
First thing to do: check status.claude.com to see if there's an active incident. If there is, wait it out — no code fix will help during a live outage.
The Wrong Way to Fix 529 (What Not to Do)
Most developers instinctively do one of these — all of them make things worse:
The Correct Fix: Exponential Backoff with Jitter
The official Anthropic recommendation is to retry with exponential backoff. Here's the production-ready implementation in Python and TypeScript:
Python Fix
import anthropic
import time
import random
client = anthropic.Anthropic()
def call_claude_with_retry(messages, model="claude-sonnet-4-6", max_retries=5):
for attempt in range(max_retries):
try:
response = client.messages.create(
model=model,
max_tokens=1024,
messages=messages
)
return response
except anthropic.APIStatusError as e:
if e.status_code == 529:
if attempt == max_retries - 1:
raise # Give up after max retries
# Exponential backoff with jitter
base_delay = min(60, 1 * (2 ** attempt))
jitter = random.uniform(0, 0.75)
wait_time = base_delay + jitter
print(f"Claude API overloaded (529). Attempt {attempt + 1}/{max_retries}. Waiting {wait_time:.1f}s...")
time.sleep(wait_time)
else:
raise # Don't retry other errors
return None
TypeScript/JavaScript Fix
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
async function callClaudeWithRetry(
messages: Anthropic.MessageParam[],
model = "claude-sonnet-4-6",
maxRetries = 5
) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
const response = await client.messages.create({
model,
max_tokens: 1024,
messages,
});
return response;
} catch (error) {
if (error instanceof Anthropic.APIStatusError && error.status === 529) {
if (attempt === maxRetries - 1) throw error;
const baseDelay = Math.min(60000, 1000 * Math.pow(2, attempt));
const jitter = Math.floor(Math.random() * 750);
const waitTime = baseDelay + jitter;
console.log(
`Claude API overloaded (529). Attempt ${attempt + 1}/${maxRetries}. Waiting ${waitTime}ms...`
);
await new Promise((resolve) => setTimeout(resolve, waitTime));
} else {
throw error;
}
}
}
}
The Backoff Schedule
With the code above, retries happen at these approximate intervals:
Using the Official Anthropic SDK (Recommended)
If you're using the official anthropic Python or TypeScript SDK, it already includes automatic retry logic for 529 errors. Enable it when creating the client:
# Python — SDK retries 529 automatically
client = anthropic.Anthropic(
max_retries=3, # Default is 2
)
# TypeScript
const client = new Anthropic({
maxRetries: 3,
})
The SDK uses exponential backoff with jitter by default. For most applications this is sufficient — you only need custom retry logic if you need more control over the backoff schedule or retry budget.
Reducing 529 Frequency in Production
Beyond retrying, these changes reduce how often you hit 529:
1. Limit Concurrent Requests
If your app fires 50 parallel Claude requests at once during a traffic spike, you're contributing to the overload. Use a semaphore or queue to cap concurrency:
import asyncio
import anthropic
client = anthropic.AsyncAnthropic()
semaphore = asyncio.Semaphore(5) # Max 5 concurrent requests
async def limited_call(messages):
async with semaphore:
return await client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=messages
)
2. Use the Batch API for Non-Urgent Work
The Message Batches API processes requests asynchronously and is less susceptible to 529 errors than the synchronous Messages API. Use it for bulk processing, nightly jobs, and any task where the user doesn't need an immediate response.
3. Use Streaming for Long Requests
Long synchronous requests (large max_tokens) are more likely to hit overload timeouts. Switch to streaming:
with client.messages.stream(
model="claude-sonnet-4-6",
max_tokens=4096,
messages=[{"role": "user", "content": "Write a long analysis..."}]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
4. Enable Prompt Caching
If you repeatedly send large system prompts (e.g. a 10,000 token context document), prompt caching reduces the token load per request and lowers your contribution to API pressure.
529 in Claude Code Specifically
If you're seeing repeated 529 errors inside Claude Code (the CLI tool), the situation is slightly different. Claude Code already retries 529 automatically before surfacing the error to you. So if you're seeing the error message, Claude Code has already attempted several retries and they all failed.
In this case:
How to Report 529 to Anthropic Support?
If 529 errors persist for more than 30 minutes despite clean status page readings, escalate to Anthropic support. Include:
The request_id is the most important piece — it lets Anthropic engineers locate your exact request in their logs.
Sources: Anthropic API Error Reference · Claude Status Page. This article was produced with AI assistance and reviewed for accuracy. Editorial standards.