For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Get API Key
GuidesAPI Reference
GuidesAPI Reference
  • Getting Started
    • Introduction
    • Getting Started
    • Authentication
    • Rate Limits
    • Error Handling
  • Concepts
    • Projects & Documents
    • Document Processing
    • Knowledge & Search
    • Compliance Checking
    • Environments & Keys
    • Streaming & Async
  • How-To Guides
    • Manage Projects
    • Upload & Manage Documents
    • Query Your Knowledge Base
    • Run Compliance Checks
    • View Your Organization
Get API Key
LogoLogo
On this page
  • Why streaming?
  • Chat streaming
  • When to use streaming
  • Handling SSE in JavaScript
  • Asynchronous operations
  • Polling pattern
  • Recommended polling intervals
  • Synchronous vs asynchronous at a glance
Concepts

Streaming & Async Patterns

Real-time responses and background processing

Was this page helpful?
Edit this page
Previous

Manage Projects

Create, list, update, and delete knowledge projects
Next
Built with

Why streaming?

Some API calls return immediately (creating a project takes milliseconds). Others — like asking the AI a complex question — can take several seconds. Streaming lets you show results as they’re generated instead of waiting for the complete response.

Chat streaming

The Chat endpoint (POST /api/v2/chat) supports Server-Sent Events (SSE) for real-time streaming. Set stream: true in your request:

$curl -X POST https://api.pyramid-ai.com/api/v2/chat \
> -H "Authorization: Bearer pai_live_YOUR_KEY" \
> -H "Content-Type: application/json" \
> -d '{
> "project_id": "...",
> "messages": [{"role": "user", "content": "What are the safety requirements?"}],
> "stream": true
> }'

The response arrives as a stream of events:

data: {"type": "text", "content": "Based on"}
data: {"type": "text", "content": " the project"}
data: {"type": "text", "content": " safety plan,"}
data: {"type": "text", "content": " all workers"}
...
data: {"type": "done", "sources": [...], "usage": {...}}

Each event contains a small piece of the response. Your UI can display each piece as it arrives — creating a typing effect.

When to use streaming

Use caseStreaming?
Chatbot UI showing responses in real timeYes
Server-to-server integration logging full responsesNo (use stream: false)
Mobile app showing AI typing indicatorYes
Automated pipeline processing answersNo

Handling SSE in JavaScript

1const response = await fetch('/api/v2/chat', {
2 method: 'POST',
3 headers: {
4 'Authorization': 'Bearer pai_live_YOUR_KEY',
5 'Content-Type': 'application/json',
6 },
7 body: JSON.stringify({
8 project_id: '...',
9 messages: [{ role: 'user', content: 'What are the safety requirements?' }],
10 stream: true,
11 }),
12});
13
14const reader = response.body.getReader();
15const decoder = new TextDecoder();
16
17while (true) {
18 const { done, value } = await reader.read();
19 if (done) break;
20
21 const text = decoder.decode(value);
22 // Parse SSE events and update UI
23 console.log(text);
24}

Asynchronous operations

Some operations run in the background because they take too long for a synchronous response:

OperationPatternHow to check status
Document processingFire-and-forgetGET /documents/{id} — poll status
Batch queriesBackground jobGET /agent/batches/{id} — poll status
Compliance runsBackground jobGET /check/runs/{id} — poll status

Polling pattern

For background operations, poll the status endpoint until the job completes:

1import time
2import requests
3
4def wait_for_completion(url, headers, max_wait=300):
5 """Poll until status is terminal (completed/failed)."""
6 start = time.time()
7
8 while time.time() - start < max_wait:
9 response = requests.get(url, headers=headers)
10 data = response.json()["data"]
11
12 if data["status"] in ("completed", "failed", "partial"):
13 return data
14
15 time.sleep(5) # Check every 5 seconds
16
17 raise TimeoutError("Operation did not complete in time")

Recommended polling intervals

OperationPoll intervalTypical duration
Document processingEvery 5 seconds10–60 seconds
Batch queriesEvery 10 seconds1–30 minutes (depends on batch size)
Compliance runsEvery 10 seconds1–10 minutes (depends on requirement count)

Don’t poll more frequently than every 2 seconds. Excessive polling wastes your rate limit quota without meaningfully improving responsiveness.

Synchronous vs asynchronous at a glance

EndpointSync/AsyncResponse time
POST /projectsSyncInstant
POST /documents (reserve)SyncInstant
POST /documents/processAsync (triggers background job)Seconds to minutes
POST /agent/querySync3–15 seconds
POST /chatSync (streamable)3–30 seconds
POST /agent/batchesAsyncMinutes to hours
POST /check/verifySync5–30 seconds
POST /check/runs (create)SyncInstant
POST /check/runs/{id}/triggerAsyncMinutes