MCP Server
The Pyramid AI MCP server exposes your organization’s document corpus over the Model Context Protocol — an open standard for connecting AI assistants to external tools and data. Any MCP-aware client (Claude, Cursor, or a custom agent built on the MCP SDK) can search your documents and get back grounded answers with source links, without you writing a line of integration code.
Run hybrid semantic + keyword search and get back ranked documents with their source URLs.
Get a finished, cited answer synthesized from the most relevant passages in your corpus.
Every result carries a source link, so your agent can always cite where an answer came from.
MCP access is gated per organization. If your key returns 403 FEATURE_NOT_ENABLED, ask your
Pyramid AI account manager to enable the MCP server for your org.
Endpoint
- Transport: Streamable HTTP, stateless — one
POSTper JSON-RPC exchange, no session to manage. - Only
POSTis served;GETandDELETEreturn a JSON-RPC405.
Connect in under 5 minutes
Get an API key
Create a key in the Pyramid AI platform, or use one you
already have. Production keys (pai_live_*) work against api.pyramid-ai.com; staging keys
(pai_test_*) against api-staging.pyramid-ai.com. See Authentication for
key formats, environments, and rotation.
Store your API key securely — it is shown only once at creation time and cannot be retrieved later.
Add the server to your client
Point your MCP client at the endpoint and pass your key as a bearer token. See Connect your client below for the exact config for Claude, Cursor, and the SDK.
Authentication
The MCP endpoint uses the same bearer API-key auth as every other Pyramid AI API route — no
OAuth flow. Send your key in the Authorization header, and because the transport is Streamable
HTTP, your client must advertise both content types in Accept:
Responses come back as a one-shot SSE frame (event: message / data: <json-rpc>). Most MCP
clients and SDKs handle this framing for you.
Connect your client
Claude Code
Claude Desktop / Cursor
Add the server to your mcpServers config (claude_desktop_config.json for Claude Desktop,
.cursor/mcp.json for Cursor):
Custom agent (TypeScript SDK)
MCP Inspector (visual debugging)
Choose transport Streamable HTTP, set the URL to the endpoint, and add the
Authorization: Bearer … header.
Tools
The server exposes two tools. Both accept the same arguments — the difference is whether you want raw hits to reason over yourself, or a finished, cited answer.
Shared arguments
search_documents
Hybrid (semantic vector + full-text BM25) search over the corpus, returning ranked documents with their source links. The tool result is a JSON payload:
Always surface results[].url to the user — it is the canonical source link. When hasResults
is false, tell the user nothing relevant was found and ask them to refine; do not answer
from prior knowledge.
answer_question
Runs the same hybrid search, then synthesizes a complete written answer from the top passages,
with bracketed [n] citation markers. The answer is atomic — it arrives whole after ~20–25s
(MCP tool results cannot stream).
citationsis the subset ofsourcesthe answer actually references via its[n]markers.sourceslists every passage the search retrieved, whether or not it was cited.- The payload is also returned as
structuredContentfor SDK clients that prefer structured parsing.
Scope & broadening
A narrow scope is a relevance hint, not a hard filter. When you pass a narrow technical
scope, the search runs over the broad technical pool so a wrong category guess can’t hide the
right document — and the response sets broadened: true with your original choice preserved in
requestedScope. Results may therefore include other document types.
answer_questionwill prefer the requested document type and add a caveat in the answer when it relies on another.press_releasesandeverythingare exact — they are never broadened.
A hasResults: false response means nothing ranked as relevant; it does not prove the
document doesn’t exist. Refine the query rather than concluding the corpus lacks coverage.
Progress updates
MCP tool results don’t stream, but answer_question emits coarse notifications/progress while
it works — if your call opts in with a progressToken in the request _meta:
Stages: Searching the document corpus… (10/100) → Retrieved N documents — generating answer…
(40/100) → Answer ready. (100/100). SDK clients receive these via the onprogress callback on
callTool. Budget a 60-second client-side timeout for answer_question.
Raw JSON-RPC (curl)
For testing or non-SDK clients, call the JSON-RPC API directly. The Accept header must advertise
both content types:
List the available tools with "method": "tools/list".