Rate-limit a tool

Problem: a tool is expensive, dangerous in volume, or both — fs_write, db_query, send_email. You want to cap how often any single session can fire it.

Solution: add an action: rate_limit rule. mcpgw uses a token-bucket per (rule, session) keyed off the Mcp-Session-Id header.

Recipe

policy:
  rules:
    - id: rl-fs-write
      action: rate_limit
      when: { tool_name: "fs_write" }
      tokens_per_second: 10        # steady-state allowance
      burst: 20                    # peak burst before the bucket starts gating

tokens_per_second is the refill rate, burst is the bucket capacity. A request consumes one token. When the bucket is empty, requests are rejected with HTTP 429 and JSON-RPC error -32003 rate_limited until enough time has passed for tokens to refill.

This is separate from top-level input_rate_limit, which protects the gateway before body read and parse. Use input_rate_limit for request-flood protection; use policy rate_limit rules for per-tool quotas.

Picking values

  • One write every few seconds: tokens_per_second: 0.2, burst: 3
  • Modest API quota (e.g. 60/minute): tokens_per_second: 1, burst: 10
  • High-throughput protection only: tokens_per_second: 100, burst: 200

tokens_per_second accepts fractional values. 0.0001 is one token every ~2.7 hours — useful for tool calls you only want to allow a few times per day.

Per-(rule, session) semantics

Two sessions calling the same rate-limited tool have independent buckets. A burst from session-A does not consume session-B’s allowance. This means:

  • Each connected agent gets its own quota.
  • A single misbehaving agent cannot starve the others.
  • Total system throughput is tokens_per_second × number_of_active_sessions, not a global cap.

If you want a global cap, deploy a second mcpgw with a tighter rule in front, or wait for a future release (which will add scope: global).

Verifying

SESSION="rl-test"
for i in $(seq 1 25); do
  curl -s -o /dev/null -w "%{http_code}\n" \
    -X POST http://localhost:7332/mcp \
    -H "Content-Type: application/json" \
    -H "Mcp-Session-Id: $SESSION" \
    -d "{\"jsonrpc\":\"2.0\",\"id\":$i,\"method\":\"tools/call\",\"params\":{\"name\":\"fs_write\",\"arguments\":{\"path\":\"/tmp/x\",\"content\":\"y\"}}}"
done

You should see 200 for the first burst requests, then a stream of 429s as the bucket drains.

Pitfalls

  • The bucket is keyed by the Mcp-Session-Id header. Clients that do not send a session id share a single bucket per rule (the empty session). If your client does not send the header, set one in your reverse proxy.
  • X-Forwarded-For is intentionally ignored by input_rate_limit, the separate IP-based gateway throttle. See Explanation: rate-limit identity for the full story when running behind a load balancer.
  • Rate-limit denial counts as policy refusal, not as a transport error. Client retry-with-backoff logic should treat HTTP 429 specifically.
  • First match wins. A wildcard redact rule placed above your rate-limit rule will redact-and-forward instead of throttling. Order matters.