1. Overview
MCP latency is a UX problem and a token-cost problem. Both are solvable.
2. Caching strategies
KV cache
Cloudflare KV for read-mostly resources.
Edge cache
Cache REST proxies at the edge.
In-memory
Process-local LRU for hot keys.
3. Streaming responses
SSE transport pays off when a tool returns iteratively (e.g. listing 500 posts). Stream pages instead of buffering.
4. Background queues
Move slow side-effects (transcoding, embedding, refresh) to a queue. Return a job id from the tool, expose a jobs.status tool.
5. Token efficiency
Return ids over objects. Paginate by default. Compress JSON. Every token saved is a cent saved at scale.