Tool-search tradeoffs

mcpgw can synthesize a tools/search surface in front of upstreams that don’t provide one. This is a deliberate behavioral override of the MCP protocol: agents see one virtual tool instead of N upstream tools. The tradeoff favors token cost over discoverability.


What changes

Without synthesizeWith synthesize
Agent receives full tool catalogue (~N×120 tokens) on every tools/listAgent receives one tool definition (~50 tokens)
Agent picks a tool name directly from the listAgent must call mcp_search first, then tools/call with the discovered name
1 round trip for tool execution2 round trips when the agent doesn’t already know the tool name

When this hurts

1. Tightly-bound clients. Agents pre-compiled against specific tool names — for example, always calling fs_read to read a file — will keep doing that. Synthesize mode does not break those calls. They still pass through to the upstream. But they don’t benefit either, because they never read the synthesized tools/list and never call mcp_search. Token savings only materialize for clients that read tools/list and react accordingly.

2. Search ranking is keyword-only. mcpgw scores tools by token overlap between the query and the tool’s name plus description. There is no semantic similarity. Tokens are alphanumeric substrings — hyphens and underscores are treated as token boundaries, so fs-read and fs_read both tokenize to ["fs", "read"]. A query like "look up a file" misses a tool whose description says only "Read a file from local disk" because the token "look" doesn’t appear. Document tools with verbs the agent’s likely to use.

3. Cache staleness window. The index refreshes at tool_search.refresh_interval (default 60s). A tool added to an upstream is invisible to mcp_search for up to one full refresh interval. Shorter intervals reduce the staleness window at the cost of more tools/list requests against upstreams.

4. Discovery security surface. mcp_search returns the same tool definitions the upstream would return for a tools/list call. If you have authentication policies that gate which tools different clients see, synthesize mode does not enforce those — every authenticated caller to the gateway can search the full index. Scope-based filtering is not implemented. If you need per-client tool visibility, do not use synthesize mode without a separate gateway instance per audience, or restrict mcp_search access at the auth layer (require a specific OAuth scope).

5. Mode changes require restart. Switching between passthrough and synthesize is not hot-reloadable. SIGHUP refreshes the index in synthesize mode — it re-fetches tools/list from each upstream and replaces the index atomically — but it does not alter the gateway’s routing policy. Any mode change requires a full process restart.


When this helps

High tool-count upstreams. The token reduction is proportional to N. For an upstream exposing 200 tools, a tools/list response is ~24,000 tokens. With synthesize mode, tools/list returns ~50 tokens (the mcp_search stub), and the agent loads only the matched tools’ definitions on demand. At N=200 with one search returning 5 results, the per-task cost drops from ~24,000 tokens to ~650 tokens — roughly 97% reduction. At N=10, the savings are marginal and the extra round trip may not be worth it.

Agents written for tool-search-style discovery. Claude Desktop, Claude Code, and other clients that read tools/list and adapt at runtime benefit immediately. Agents with hardcoded tool names do not.

Single-tenant gateways. Synthesize mode exposes the full upstream tool index to every authenticated caller. In a single-tenant deployment — one gateway per customer, or one gateway serving a homogeneous agent fleet — this is not a concern.


Token reduction math

A typical tool definition (name + description + schema) is roughly 120–200 tokens. With N upstream tools:

  • Without synthesize: tools/list returns ~120N tokens to the agent on every call.
  • With synthesize: tools/list returns ~50 tokens (the mcp_search definition). The agent calls mcp_search once, receiving ~120 tokens per result. If the agent makes one search returning 5 results, the total is 50 + 600 = 650 tokens.

For agents that don’t use mcp_search (because they were trained on specific tool names), savings = 0. Those agents call the upstream directly, and the upstream still serves the full schema on their tools/list call — synthesize mode intercepts tools/list only at the gateway; it cannot change what the upstream sends to agents that bypass the gateway or call the upstream directly.


Architecture note

The synthesizer runs above the policy engine in proxy.Handler.ServeHTTP. Synthetic responses are not subject to the operator’s tools/call deny rules: mcp_search is the gateway’s own response, not an upstream call. The policy engine never sees a synthesized request.

If an operator wants to restrict mcp_search access, the only supported mechanism in the current release is authentication — require a specific OAuth scope or a dedicated API key. Leaving synthesize mode off for restricted tenants and exposing a separate gateway instance with synthesize enabled for the broader fleet is the recommended pattern for mixed-audience deployments.


Future work (not yet shipped)

  • Vector-similarity ranking for mcp_search instead of pure keyword matching
  • Per-client filtered indexes based on OAuth scope
  • Negative cache for upstream failure, so a flaky upstream doesn’t drop its tools from the index on every refresh cycle