For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Get API Key
GuidesAPI Reference
GuidesAPI Reference
  • Getting Started
    • Introduction
    • Getting Started
    • Authentication
    • Rate Limits
    • Error Handling
  • Concepts
    • Projects & Documents
    • Document Processing
    • Knowledge & Search
    • Compliance Checking
    • Environments & Keys
    • Streaming & Async
  • How-To Guides
    • Manage Projects
    • Upload & Manage Documents
    • Query Your Knowledge Base
    • Run Compliance Checks
    • View Your Organization
Get API Key
LogoLogo
On this page
  • Overview
  • Step 1: Reserve an upload slot
  • Why presigned URLs?
  • Step 2: Upload the file
  • In code (JavaScript)
  • In code (Python)
  • Step 3: Trigger processing
  • What happens during processing
  • Checking processing status
  • Error handling
  • Batch uploads
Concepts

Document Processing

The three-step upload flow and what happens behind the scenes

Was this page helpful?
Edit this page
Previous

Knowledge & Search

How the AI finds answers in your documents
Next
Built with

Overview

Getting a document into Pyramid AI is a three-step process. This design keeps uploads fast and reliable — even for large files — by separating the upload from the processing.

┌─────────────┐ ┌──────────────┐ ┌─────────────────┐
│ 1. Reserve │ ───> │ 2. Upload │ ───> │ 3. Process │
│ (API call) │ │ (direct PUT)│ │ (API call) │
└─────────────┘ └──────────────┘ └─────────────────┘
Returns: File goes Starts:
- document_id directly to - Text extraction
- upload_url cloud storage - Chunking
- expiry time - Indexing

Step 1: Reserve an upload slot

Tell the API you’re about to upload a file. You provide the file name, type, and size. The API returns a presigned upload URL — a temporary, secure link where you’ll send the actual file.

$curl -X POST https://api.pyramid-ai.com/api/v2/documents \
> -H "Authorization: Bearer pai_live_YOUR_KEY" \
> -H "Content-Type: application/json" \
> -d '{
> "file_name": "safety-plan.pdf",
> "content_type": "application/pdf",
> "size_bytes": 2048576,
> "project_id": "YOUR_PROJECT_ID"
> }'

Response:

1{
2 "success": true,
3 "data": {
4 "id": "doc_abc123",
5 "status": "awaiting_upload",
6 "upload": {
7 "url": "https://storage.supabase.co/...",
8 "method": "PUT",
9 "recommended_content_type": "application/pdf",
10 "expires_at": "2026-05-22T14:00:00Z"
11 }
12 }
13}

The upload URL expires after 2 hours. If it expires before you upload, call the reserve endpoint again to get a fresh URL.

Why presigned URLs?

The file goes directly from your system to cloud storage — it never passes through the Pyramid API server. This means:

  • No file size limits from the API layer (Vercel’s 4.5 MB body limit is bypassed)
  • Faster uploads — direct connection to storage
  • More reliable — fewer hops means fewer failure points

Step 2: Upload the file

Send the actual file bytes to the presigned URL using a PUT request. This is a direct upload to cloud storage — no API key needed (the URL itself contains the authorization).

$curl -X PUT "PRESIGNED_URL_FROM_STEP_1" \
> -H "Content-Type: application/pdf" \
> --data-binary @safety-plan.pdf

Use --data-binary (not -d) to preserve the file’s binary content. Using -d will corrupt binary files like PDFs.

In code (JavaScript)

1const file = document.getElementById('fileInput').files[0];
2
3await fetch(uploadUrl, {
4 method: 'PUT',
5 headers: { 'Content-Type': file.type },
6 body: file,
7});

In code (Python)

1with open('safety-plan.pdf', 'rb') as f:
2 requests.put(
3 upload_url,
4 data=f,
5 headers={'Content-Type': 'application/pdf'}
6 )

Step 3: Trigger processing

Once the file is uploaded, tell the API to start processing it. You can submit multiple document IDs at once.

$curl -X POST https://api.pyramid-ai.com/api/v2/documents/process \
> -H "Authorization: Bearer pai_live_YOUR_KEY" \
> -H "Content-Type: application/json" \
> -d '{"document_ids": ["doc_abc123"]}'

Response:

1{
2 "success": true,
3 "data": {
4 "processed": ["doc_abc123"],
5 "skipped": [],
6 "failed": []
7 }
8}
ResultMeaning
processedSuccessfully started processing
skippedAlready processed, or not in the right state
failedSomething went wrong (error message included)

What happens during processing

Behind the scenes, the processing pipeline does the following:

1

Text extraction

The system reads the file and extracts all text content. For PDFs, this includes OCR (optical character recognition) for scanned documents.

2

Chunking

The extracted text is split into overlapping chunks — typically a few paragraphs each. Overlap ensures context isn’t lost at chunk boundaries.

3

Embedding

Each chunk is converted into a mathematical representation (a vector embedding) that captures its meaning. This is what allows semantic search — finding relevant content by meaning, not just keywords.

4

Indexing

The embeddings are stored in a searchable index. When you ask a question, the AI compares your question’s embedding against all chunk embeddings to find the most relevant passages.

Checking processing status

Poll the document endpoint to check when processing is complete:

$curl https://api.pyramid-ai.com/api/v2/documents/doc_abc123 \
> -H "Authorization: Bearer pai_live_YOUR_KEY"

Watch the status field:

StatusNext action
awaiting_uploadUpload the file (Step 2)
processingWait and poll again
completedReady to query
failedCheck error_message, re-upload if needed

Processing typically takes 10–60 seconds for a standard PDF. Large documents (100+ pages) or scanned PDFs with OCR may take longer.

Error handling

Common processing failures:

ErrorCauseFix
size_mismatchUploaded file size doesn’t match declared size_bytesRe-reserve with correct size
content_type_mismatchFile content doesn’t match declared content_typeEnsure the file matches the MIME type
extraction_failedUnable to extract text (corrupted file, password-protected)Re-upload a valid file
upload_url_creation_failedStorage service temporarily unavailableRetry the reserve step

Batch uploads

For uploading many files at once, repeat the reserve → upload → process cycle for each file, then call process with all document IDs:

$# Reserve 3 files, upload each one, then process all at once
$curl -X POST https://api.pyramid-ai.com/api/v2/documents/process \
> -H "Authorization: Bearer pai_live_YOUR_KEY" \
> -H "Content-Type: application/json" \
> -d '{
> "document_ids": ["doc_1", "doc_2", "doc_3"]
> }'

This is more efficient than processing one at a time, as the pipeline can parallelize the work.