MW
Reference 12 min read

MCP performance optimization

Cache aggressively. Stream when it helps. Queue what's slow. Strip every byte from the agent's context window.

Performance Caching Streaming Queues
Auto-cycle
1. The agent requests data — paginated, token-budgeted.

Click a node to step through

1. Overview

MCP latency is a UX problem and a token-cost problem. Both are solvable.

2. Caching strategies

KV cache

Cloudflare KV for read-mostly resources.

Edge cache

Cache REST proxies at the edge.

In-memory

Process-local LRU for hot keys.

3. Streaming responses

SSE transport pays off when a tool returns iteratively (e.g. listing 500 posts). Stream pages instead of buffering.

4. Background queues

Move slow side-effects (transcoding, embedding, refresh) to a queue. Return a job id from the tool, expose a jobs.status tool.

5. Token efficiency

Return ids over objects. Paginate by default. Compress JSON. Every token saved is a cent saved at scale.

Fast servers ship fast agents

Cache. Stream. Queue. Compress.

  • Cached
  • Streamed
  • Queued