Streaming & Async Patterns | Pyramid AI

Why streaming?

Some API calls return immediately (creating a project takes milliseconds). Others — like asking the AI a complex question — can take several seconds. Streaming lets you show results as they’re generated instead of waiting for the complete response.

Chat streaming

The Chat endpoint (POST /api/v2/chat) supports Server-Sent Events (SSE) for real-time streaming. Set stream: true in your request:

$ curl -X POST https://api.pyramid-ai.com/api/v2/chat \
>   -H "Authorization: Bearer pai_live_YOUR_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "project_id": "...",
>     "messages": [{"role": "user", "content": "What are the safety requirements?"}],
>     "stream": true
>   }'

The response arrives as a stream of events:

data: {"type": "text", "content": "Based on"}
data: {"type": "text", "content": " the project"}
data: {"type": "text", "content": " safety plan,"}
data: {"type": "text", "content": " all workers"}
...
data: {"type": "done", "sources": [...], "usage": {...}}

Each event contains a small piece of the response. Your UI can display each piece as it arrives — creating a typing effect.

When to use streaming

Use case	Streaming?
Chatbot UI showing responses in real time	Yes
Server-to-server integration logging full responses	No (use `stream: false`)
Mobile app showing AI typing indicator	Yes
Automated pipeline processing answers	No

Handling SSE in JavaScript

1 const response = await fetch('/api/v2/chat', {
2   method: 'POST',
3   headers: {
4     'Authorization': 'Bearer pai_live_YOUR_KEY',
5     'Content-Type': 'application/json',
6   },
7   body: JSON.stringify({
8     project_id: '...',
9     messages: [{ role: 'user', content: 'What are the safety requirements?' }],
10     stream: true,
11   }),
12 });
13 
14 const reader = response.body.getReader();
15 const decoder = new TextDecoder();
16 
17 while (true) {
18   const { done, value } = await reader.read();
19   if (done) break;
20 
21   const text = decoder.decode(value);
22   // Parse SSE events and update UI
23   console.log(text);
24 }

Asynchronous operations

Some operations run in the background because they take too long for a synchronous response:

Operation	Pattern	How to check status
Document processing	Fire-and-forget	`GET /documents/{id}` — poll `status`
Batch queries	Background job	`GET /agent/batches/{id}` — poll `status`
Compliance runs	Background job	`GET /check/runs/{id}` — poll `status`

Polling pattern

For background operations, poll the status endpoint until the job completes:

1 import time
2 import requests
3 
4 def wait_for_completion(url, headers, max_wait=300):
5     """Poll until status is terminal (completed/failed)."""
6     start = time.time()
7 
8     while time.time() - start < max_wait:
9         response = requests.get(url, headers=headers)
10         data = response.json()["data"]
11 
12         if data["status"] in ("completed", "failed", "partial"):
13             return data
14 
15         time.sleep(5)  # Check every 5 seconds
16 
17     raise TimeoutError("Operation did not complete in time")

Recommended polling intervals

Operation	Poll interval	Typical duration
Document processing	Every 5 seconds	10–60 seconds
Batch queries	Every 10 seconds	1–30 minutes (depends on batch size)
Compliance runs	Every 10 seconds	1–10 minutes (depends on requirement count)

Don’t poll more frequently than every 2 seconds. Excessive polling wastes your rate limit quota without meaningfully improving responsiveness.

Synchronous vs asynchronous at a glance

Endpoint	Sync/Async	Response time
`POST /projects`	Sync	Instant
`POST /documents` (reserve)	Sync	Instant
`POST /documents/process`	Async (triggers background job)	Seconds to minutes
`POST /agent/query`	Sync	3–15 seconds
`POST /chat`	Sync (streamable)	3–30 seconds
`POST /agent/batches`	Async	Minutes to hours
`POST /check/verify`	Sync	5–30 seconds
`POST /check/runs` (create)	Sync	Instant
`POST /check/runs/{id}/trigger`	Async	Minutes