Rate Limits | Pyramid AI | Developer Docs

The Pyramid API enforces per-key rate limits to ensure fair usage and system stability.

Default limits

Limit	Default	Scope
Requests per minute	100	Per API key
Monthly request cap	Varies by plan	Per organization
Monthly token cap	Varies by plan	Per organization

Custom rate limits can be configured per key during provisioning.

Rate limit headers

Every API response includes rate limit information:

Header	Description	Example
`X-RateLimit-Limit`	Max requests per minute for this key	`100`
`X-RateLimit-Remaining`	Requests remaining in current window	`97`
`X-RateLimit-Reset`	Unix timestamp when the window resets	`1713100800`
`Retry-After`	Seconds to wait (only on `429` responses)	`42`

Handling rate limits

When you exceed the rate limit, the API returns a 429 status code:

1 {
2   "success": false,
3   "error": {
4     "code": "RATE_LIMITED",
5     "message": "Rate limit exceeded. Retry after 42 seconds."
6   },
7   "request_id": "..."
8 }

Recommended retry strategy

Use exponential backoff with the Retry-After header:

1 import time
2 import requests
3 
4 def call_api(url, headers, body, max_retries=3):
5     for attempt in range(max_retries):
6         response = requests.post(url, headers=headers, json=body)
7 
8         if response.status_code == 429:
9             retry_after = int(response.headers.get("Retry-After", 60))
10             time.sleep(retry_after)
11             continue
12 
13         return response
14 
15     raise Exception("Max retries exceeded")

Quota limits

Organizations have monthly caps on total requests and LLM tokens. When a quota is exceeded, the API returns 402:

1 {
2   "success": false,
3   "error": {
4     "code": "QUOTA_EXCEEDED",
5     "message": "Monthly token quota exceeded. Contact your account manager to increase limits."
6   },
7   "request_id": "..."
8 }

Quota usage resets on the 1st of each month at 00:00 UTC.

Tips for staying within limits

Batch operations — use batch endpoints (e.g., POST /agent/batches) instead of many individual calls
Cache responses — cache results client-side where appropriate
Use idempotency keys — prevent duplicate processing on retries (see Error Handling)
Monitor usage — check X-RateLimit-Remaining headers proactively