This document explains how mcpgw is structured and what happens to a request from arrival to response. The goal is to give you enough mental model to predict behavior, debug surprises, and make sound design calls when configuring it.

If you want to look up a specific value, use the Reference instead. This page is for understanding.

The shape of mcpgw

mcpgw is a single Go binary, statically linked, distroless-shipped. It does one thing: be a programmable HTTP-to-HTTP proxy for JSON-RPC traffic that follows the Model Context Protocol.

There is no database, no admin API, no control plane, no configuration UI. The gateway reads YAML at startup, listens on a port, and forwards or rejects requests according to that YAML. State is limited to:

The compiled policy engine (in memory, swappable on SIGHUP)
HTTP connection pools to upstreams (in memory)
Token-bucket counters per (rule, session) (in memory)
The append-only audit log (on disk)
The OTLP exporter buffer (in memory, drained continuously)

This shape is deliberate. The cost of adding a control plane is that you now have a control plane to operate, secure, upgrade, and back up. mcpgw is sized to be one of many forwarding hops, deployed close to where MCP clients live, configured by the same toolchain that configures the rest of your infrastructure.

Request lifecycle

A request follows eight phases. Each phase has a single, well-defined termination behavior — there is no shared mutable state that one phase can leave behind to confuse another.

1. Throttle     → optional input_rate_limit token bucket (pre-parse)
2. Accept       → TCP/TLS handshake; read up to 16 MiB
3. Authenticate → optional API-key check
4. Parse        → JSON-RPC envelope; extract id, method, params
5. Identify     → session id, client IP, request id
6. Decide       → policy engine: allow / deny / redact / rate_limit
7. Route        → choose upstream by tool matchers or default_upstream
8. Forward      → POST to upstream; stream response back
9. Audit/Trace  → write JSONL line, finalize span, export

Each phase can terminate the request. If phase 4 rejects (parse error), phases 6–8 do not run, but phase 9 still records the rejection. There is exactly one audit line per request, regardless of where the request died.

Phase 1 — Throttle (optional)

If input_rate_limit.enabled is true, the gateway checks a token bucket keyed by RemoteAddr (the TCP peer). Empty bucket → HTTP 429, JSON-RPC -32003 rate_limited, audit line with decision: "input_rate_limited". This bucket is independent from policy action: rate_limit and exists to protect the gateway from request floods before body read, auth, parse, or policy evaluation. Disabled by default.

Phase 2 — Accept

http.Server accepts the connection. If TLS is configured, the handshake happens here. The body is read into memory with a hard cap (16 MiB). Bodies larger than the cap are rejected before parse with body_too_large.

The 16 MiB cap is a practical limit. Real MCP traffic is dominated by tool argument JSON, which is typically a few KB. The cap exists to bound memory usage in adversarial scenarios.

Phase 3 — Authenticate

If auth.enabled is true, mcpgw reads the configured auth header before parsing the JSON-RPC envelope. Missing, invalid, and expired keys return HTTP 401 with JSON-RPC -32005 unauthorized. Failed auth still emits an audit line with auth_result set.

If auth is disabled, this phase is a no-op and unauthenticated behavior is preserved (deploy behind your existing auth proxy).

Phase 4 — Parse

The body is JSON-decoded into a JSON-RPC envelope. mcpgw validates jsonrpc: "2.0", the presence of method, and (for tools/call) the structure of params.name and params.arguments.

A parse failure returns -32700 parse_error and an audit line with decision: "parse_error". No span is emitted — the request has no mcp.method to attribute it to. (This was a deliberate v1.1 decision; v1.0 emitted an “unknown” span which polluted dashboards.)

Phase 5 — Identify

mcpgw extracts:

session_id from the Mcp-Session-Id header (empty if absent)
client_ip from RemoteAddr (the TCP peer’s host)
request_id newly minted as a ULID

It deliberately ignores X-Forwarded-For, Forwarded, and CF-Connecting-IP. See Rate-limit identity for the security argument.

Phase 6 — Decide

The policy engine is consulted with (method, tool_name). The engine walks the rule list top-down and returns the first match.

For tools/call, the engine evaluates each rule’s tool matcher:

Exact match → rule fires
Prefix, glob, regex, or tool_name_in match → rule fires
"*" wildcard → rule fires
Otherwise → continue

For non-tools/call methods (initialize, tools/list, etc.), no rules apply in v1. The engine returns “no decision” and the request proceeds to routing.

The engine is immutable per request. A SIGHUP mid-request swaps the engine pointer atomically; the in-flight request keeps the engine it started with. This is why hot-reload is non-disruptive.

If the rule says deny or rate_limit_blocked, phases 7–8 are skipped and the gateway returns the appropriate error directly.

If the rule says redact, the regex set is applied to the request body and the rewritten body proceeds to routing.

Phase 7 — Route

mcpgw chooses an upstream:

For tools/call: walk routes[] and return the first whose tool matcher matches. Routes support tool_name, tool_prefix, tool_glob, tool_regex, and tool_name_in.
For methods without a tool name (initialize, tools/list, etc.): use default_upstream.
If neither matches: return -32601 no_route (HTTP 404).

The routed upstream’s logical name becomes the mcp.upstream span attribute and the audit log’s upstream field.

Phase 8 — Forward

The request is POSTed to the upstream’s URL. mcpgw uses a per-upstream http.Client with explicit pooling: 5s dial, 10s TLS handshake, 30s response-header timeout, 90s idle, 32 idle conns per host. HTTP/2 is enabled.

mcpgw does not transform the response body. Whatever the upstream returns (JSON-RPC success, JSON-RPC error, garbage HTML) is forwarded verbatim. If the response is non-MCP (HTML error pages from a misconfigured upstream), it is returned with 502 upstream_protocol_error and a sane error body.

Errors at this phase (upstream_unreachable, upstream_timeout) are surfaced to the client as JSON-RPC errors, not as raw HTTP errors. The MCP client should be able to interpret them as “MCP server is broken” rather than “the network is broken.”

Phase 9 — Audit and trace

After the response is written (or the error is decided), mcpgw:

Writes one line to audit.jsonl with the full decision context.
Finalizes the OTel span and queues it for OTLP export.

These two writes are independent. An audit-write failure does not affect the response (already sent); an OTLP buffer overflow does not affect the audit log. Both are append-only and lossy in the catastrophic case — by design, since neither should ever block the request path.

Hot-reload model

The policy engine is held in an atomic.Pointer[policy.Engine]. On SIGHUP:

Re-read mcpgw.yaml.
Validate the entire file. If validation fails, log the error and keep the existing engine.
Compile a new policy.Engine.
Store the new pointer.

Subsequent requests load the pointer and use the new engine. In-flight requests keep the old engine because they captured the pointer at request entry.

The atomicity is essential: there is no window where a request could see a half-loaded engine. Either the new engine is fully active, or the old one is.

The same model applies to audit settings (path, max size, compression) and TLS cert/key.

telemetry, upstreams, routes, and listen are not hot-reloadable in v1. Changing them requires a restart. The reasoning is that swapping connection pools mid-flight is cheap to get wrong and hard to test; we’d rather ask operators to do a rolling restart than ship a subtle bug.

Telemetry pipeline

Spans are produced by go.opentelemetry.io/otel’s tracer SDK and exported via OTLP/HTTP to the configured endpoint. The exporter is wrapped in a BatchSpanProcessor with 5s flush, 1024 queue, and a 4-worker drain.

mcpgw does not multiplex telemetry. There is one OTLP destination — yours. The telemetry.operator block is reserved for a future opt-in vendor-side telemetry pipeline (anonymized metrics for product analytics, off by default); today it is parsed but inert.

If the OTLP receiver is unreachable, mcpgw logs a warning every minute and continues serving traffic. The BatchSpanProcessor’s queue eventually fills and starts dropping; the export side is always lossy in the catastrophic case so it cannot back-pressure the request path.

Audit pipeline

Audit lines are written through a buffered slog-backed JSONL writer. The buffer flushes on each line for safety; the underlying file is opened with O_APPEND so concurrent writes from a single process are atomic at line granularity.

Rotation is rename-then-open: the active file is renamed, then a new one is created at the original path, then writes resume. The renamed file is gzipped asynchronously by a worker goroutine.

The local file is canonical. Optional audit.sinks[] ship copies asynchronously to durable destinations (S3 — Object Lock supported, GCS, Kafka, generic HTTPS webhook). Sinks are best-effort — if a sink falls behind or fails, the local file still has every line. Sink failures land in slog.Error and never back-pressure the request hot path. Sinks are SIGHUP-reloadable.

For Kubernetes deployments where local disks are ephemeral, configure at least one sink so the audit log survives pod replacement. See How-to: ship audit to S3, GCS, Kafka, or a SIEM webhook.

What is not in mcpgw

This list is as important as what is:

No DB. No state survives a restart except what’s on disk (audit, config, license).
No admin API. Configuration is YAML; reloading is SIGHUP. There is no POST /admin/rules.
No multi-tenancy. One mcpgw instance serves one logical tenant. Multi-tenant deployments run multiple instances.
No RBAC. mcpgw authenticates a request to a key (auth.keys[].id), not a user or role. Map keys to clients/agents at your auth proxy or assign one key per service. Per-key tool scoping is reserved for a future release.
No client identity model. The Mcp-Session-Id header is opaque; it identifies sessions, not users. If you need user-level audit, set the header to a stable user id at your auth boundary.
No quotas across instances. Rate-limit buckets are in-memory per process. Sharded mcpgw deployments have sharded buckets. Use a sticky LB if this matters.

These omissions are deliberate. Every feature you do not have is a feature you do not need to operate, secure, or test.

Where this leads

The most useful thing to take away: mcpgw is a leaf in your platform, not a hub. It does proxying and policy. It does not own identity, secrets, deployment, or observability — it forwards to systems you already run. The simplicity of the gateway is what allows it to be deployed close to where MCP traffic lives without becoming the next thing you have to operate.

For “but how would I do X?” questions, the answer is usually “in a system you already have.” For “what does mcpgw do that I can’t easily do?” the answer is “parse MCP, apply policy, emit standardized telemetry, and audit every decision — all in one box, all in seconds.”

Policy model — why first-match-wins
Rate-limit identity — why XFF is ignored
Configuration reference — every knob

Architecture and request lifecycle