For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Get API Key
GuidesAPI Reference
GuidesAPI Reference
  • Getting Started
    • Introduction
    • Getting Started
    • Authentication
    • Rate Limits
    • Error Handling
  • Concepts
    • Projects & Documents
    • Document Processing
    • Knowledge & Search
    • Compliance Checking
    • Environments & Keys
    • Streaming & Async
  • How-To Guides
    • Manage Projects
    • Upload & Manage Documents
    • Query Your Knowledge Base
    • Run Compliance Checks
    • View Your Organization
Get API Key
LogoLogo
On this page
  • Default limits
  • Rate limit headers
  • Handling rate limits
  • Recommended retry strategy
  • Quota limits
  • Tips for staying within limits
Getting Started

Rate Limits

Request quotas, headers, and retry strategies
Was this page helpful?
Edit this page
Previous

Error Handling

Error codes, troubleshooting, and idempotency
Next
Built with

The Pyramid API enforces per-key rate limits to ensure fair usage and system stability.

Default limits

LimitDefaultScope
Requests per minute100Per API key
Monthly request capVaries by planPer organization
Monthly token capVaries by planPer organization

Custom rate limits can be configured per key during provisioning.

Rate limit headers

Every API response includes rate limit information:

HeaderDescriptionExample
X-RateLimit-LimitMax requests per minute for this key100
X-RateLimit-RemainingRequests remaining in current window97
X-RateLimit-ResetUnix timestamp when the window resets1713100800
Retry-AfterSeconds to wait (only on 429 responses)42

Handling rate limits

When you exceed the rate limit, the API returns a 429 status code:

1{
2 "success": false,
3 "error": {
4 "code": "RATE_LIMITED",
5 "message": "Rate limit exceeded. Retry after 42 seconds."
6 },
7 "request_id": "..."
8}

Recommended retry strategy

Use exponential backoff with the Retry-After header:

1import time
2import requests
3
4def call_api(url, headers, body, max_retries=3):
5 for attempt in range(max_retries):
6 response = requests.post(url, headers=headers, json=body)
7
8 if response.status_code == 429:
9 retry_after = int(response.headers.get("Retry-After", 60))
10 time.sleep(retry_after)
11 continue
12
13 return response
14
15 raise Exception("Max retries exceeded")

Quota limits

Organizations have monthly caps on total requests and LLM tokens. When a quota is exceeded, the API returns 402:

1{
2 "success": false,
3 "error": {
4 "code": "QUOTA_EXCEEDED",
5 "message": "Monthly token quota exceeded. Contact your account manager to increase limits."
6 },
7 "request_id": "..."
8}

Quota usage resets on the 1st of each month at 00:00 UTC.

Tips for staying within limits

  • Batch operations — use batch endpoints (e.g., POST /agent/batches) instead of many individual calls
  • Cache responses — cache results client-side where appropriate
  • Use idempotency keys — prevent duplicate processing on retries (see Error Handling)
  • Monitor usage — check X-RateLimit-Remaining headers proactively