Rate-limit a tool
Problem: a tool is expensive, dangerous in volume, or both — fs_write, db_query, send_email. You want to cap how often any single session can fire it.
Solution: add an action: rate_limit rule. mcpgw uses a token-bucket per (rule, session) keyed off the Mcp-Session-Id header.
Recipe
policy:
rules:
- id: rl-fs-write
action: rate_limit
when: { tool_name: "fs_write" }
tokens_per_second: 10 # steady-state allowance
burst: 20 # peak burst before the bucket starts gating
tokens_per_second is the refill rate, burst is the bucket capacity. A request consumes one token. When the bucket is empty, requests are rejected with HTTP 429 and JSON-RPC error -32003 rate_limited until enough time has passed for tokens to refill.
This is separate from top-level input_rate_limit, which protects the gateway before body read and parse. Use input_rate_limit for request-flood protection; use policy rate_limit rules for per-tool quotas.
Picking values
- One write every few seconds:
tokens_per_second: 0.2,burst: 3 - Modest API quota (e.g. 60/minute):
tokens_per_second: 1,burst: 10 - High-throughput protection only:
tokens_per_second: 100,burst: 200
tokens_per_second accepts fractional values. 0.0001 is one token every ~2.7 hours — useful for tool calls you only want to allow a few times per day.
Per-(rule, session) semantics
Two sessions calling the same rate-limited tool have independent buckets. A burst from session-A does not consume session-B’s allowance. This means:
- Each connected agent gets its own quota.
- A single misbehaving agent cannot starve the others.
- Total system throughput is
tokens_per_second × number_of_active_sessions, not a global cap.
If you want a global cap, deploy a second mcpgw with a tighter rule in front, or wait for a future release (which will add scope: global).
Verifying
SESSION="rl-test"
for i in $(seq 1 25); do
curl -s -o /dev/null -w "%{http_code}\n" \
-X POST http://localhost:7332/mcp \
-H "Content-Type: application/json" \
-H "Mcp-Session-Id: $SESSION" \
-d "{\"jsonrpc\":\"2.0\",\"id\":$i,\"method\":\"tools/call\",\"params\":{\"name\":\"fs_write\",\"arguments\":{\"path\":\"/tmp/x\",\"content\":\"y\"}}}"
done
You should see 200 for the first burst requests, then a stream of 429s as the bucket drains.
Pitfalls
- The bucket is keyed by the
Mcp-Session-Idheader. Clients that do not send a session id share a single bucket per rule (the empty session). If your client does not send the header, set one in your reverse proxy. X-Forwarded-Foris intentionally ignored byinput_rate_limit, the separate IP-based gateway throttle. See Explanation: rate-limit identity for the full story when running behind a load balancer.- Rate-limit denial counts as policy refusal, not as a transport error. Client retry-with-backoff logic should treat HTTP 429 specifically.
- First match wins. A wildcard redact rule placed above your rate-limit rule will redact-and-forward instead of throttling. Order matters.