Rate Limits
Request quotas, headers, and retry strategies
The Pyramid API enforces per-key rate limits to ensure fair usage and system stability.
Custom rate limits can be configured per key during provisioning.
Every API response includes rate limit information:
When you exceed the rate limit, the API returns a 429 status code:
Use exponential backoff with the Retry-After header:
Organizations have monthly caps on total requests and LLM tokens. When a quota is exceeded, the API returns 402:
Quota usage resets on the 1st of each month at 00:00 UTC.
POST /agent/batches) instead of many individual callsX-RateLimit-Remaining headers proactively