Reference May 12, 2026 12 min read

MCP performance optimization

Cache aggressively. Stream when it helps. Queue what's slow. Strip every byte from the agent's context window.

Performance Caching Streaming Queues

Auto-cycle

1. The agent requests data — paginated, token-budgeted.

Click a node to step through

1. Overview

MCP latency is a UX problem and a token-cost problem. Both are solvable.

Cloudflare KV for read-mostly resources.

Cache REST proxies at the edge.

Process-local LRU for hot keys.

SSE transport pays off when a tool returns iteratively (e.g. listing 500 posts). Stream pages instead of buffering.

Move slow side-effects (transcoding, embedding, refresh) to a queue. Return a job id from the tool, expose a jobs.status tool.

Return ids over objects. Paginate by default. Compress JSON. Every token saved is a cent saved at scale.

Cache. Stream. Queue. Compress.