How OAuth works in mcpgw

mcpgw is an OAuth 2.1 Resource Server. It does not issue tokens. It verifies tokens issued by your Authorization Server (Auth0, Okta, Keycloak, etc.) and gates access to the upstream MCP servers.


The triangle

  MCP Client
      |
      | 1. Fetch token (client credentials, PKCE, etc.)
      v
  Authorization Server  ---------> JWKS endpoint
  (Auth0 / Okta / etc.)            (public key material)
      |                                    ^
      | 2. JWT access token                | 3. Verify signature
      v                                    |
  MCP Client  ----[Bearer token]---->  mcpgw  ------>  Upstream MCP Server

The MCP client gets a token from the AS, then presents it to mcpgw on every request. mcpgw never sees client secrets or issues tokens itself.


Token validation pipeline

Each inbound request at POST /mcp follows this sequence:

  1. Header extraction. mcpgw reads the Authorization header and strips the Bearer prefix.

  2. Token shape check (looksLikeJWT). The raw credential is tested: three dot-separated base64url segments, total length ≥ 100 characters, no characters outside [A-Za-z0-9._-]. Tokens that pass → OAuth verifier. Tokens that fail → API key store (if configured). If neither path has a matching verifier/store, the request is rejected with 401 invalid.

  3. JWKS fetch (cached). The verifier fetches the AS’s JWKS from auth.oauth.jwks_url. The keyset is cached in memory for jwks_cache_ttl (default 5m) and refreshed in the background by a goroutine bound to the process’s run context.

  4. Signature verification. jwt.Parse from github.com/lestrrat-go/jwx/v2 selects the JWK by kid header and verifies the signature. An unknown kid fails with invalid_token; this is the expected failure mode when JWKS rotation has published a new key before the cache has refreshed.

  5. Standard claims checked: iss must match auth.oauth.issuer, aud must contain auth.oauth.audience, exp must be in the future within leeway (default 30s).

  6. Scope enforcement. Each scope listed in auth.oauth.required_scopes must appear in the token’s scope claim (space-separated string, RFC 9068) or scp claim (string or string array, used by Azure AD and Okta). Missing scope → 401 with error="insufficient_scope".

  7. Identity extraction. On success, client_id is read from the client_id claim (RFC 9068 §2.2). If absent, azp (authorized party) is tried as a fallback — Keycloak and some other AS emit this instead. The resolved value and scopes are stored for audit and telemetry.

The error sentinel dispatching is:

errors.Is sentinelAudit auth_resultWWW-Authenticate error
ErrExpiredexpiredinvalid_token
ErrAudiencebad_audienceinvalid_token
ErrIssuerbad_issuerinvalid_token
ErrInvalidTokeninvalidinvalid_token
ErrInsufficientinsufficient_scopeinsufficient_scope
ErrNoTokenmissing(no error param)

Discovery (RFC 9728)

When auth.oauth.enabled is true, mcpgw registers a GET handler at /.well-known/oauth-protected-resource (overridable via metadata_path). The document it serves:

{
  "resource": "https://gw.acme.com/mcp",
  "authorization_servers": ["https://idp.acme.com/"],
  "bearer_methods_supported": ["header"],
  "scopes_supported": ["mcp:read"]
}

scopes_supported is omitted when required_scopes is empty. resource is derived from auth.oauth.public_url when set; otherwise from listen + /mcp. The document is served without authentication and cached by clients for up to 1 hour (Cache-Control: public, max-age=3600).

When a request fails authentication, mcpgw’s WWW-Authenticate response header points at this document:

WWW-Authenticate: Bearer realm="mcpgw",
  resource_metadata="https://gw.acme.com/.well-known/oauth-protected-resource",
  error="invalid_token"

A fresh client that has never seen this gateway can read resource_metadata, fetch the document, find the AS URL under authorization_servers, and start the authorization flow — without any out-of-band configuration.

The resource_metadata parameter in the challenge is the mechanism described in Anthropic, “Building Agents That Reach Production Systems with MCP” as “Standardized OAuth with CIMD.” It reduces surprise re-auth prompts on first-time flows.


Why mcpgw is RS-only

mcpgw intentionally does not act as an Authorization Server. The reasoning:

  • Operators already have an AS. Every company running Auth0, Okta, Entra ID, or Keycloak already has token issuance, refresh-token storage, client credential management, and MFA policies. Duplicating that inside mcpgw would require operators to manage two credential stores.
  • Issuing tokens means storing secrets. Client secrets, private keys for signing — these require secure storage, rotation tooling, and audit trails that belong in dedicated identity infrastructure, not in a transparent proxy.
  • RS-only keeps mcpgw deployable anywhere. Adding OAuth to an existing mcpgw deployment that fronts any MCP server requires only config changes. No changes to the upstream, no new infrastructure — just point mcpgw at your existing AS.

Coexistence with API keys

Both mechanisms can be active simultaneously. The dispatch key is token shape, not configuration order:

  • Three dot-separated base64url segments, ≥ 100 characters total → OAuth verifier
  • Everything else → API key store

This is useful for migrations: existing API-key clients keep working while new OAuth clients are onboarded. There is no config knob for precedence; shape is deterministic.

One edge case: an opaque API key that happens to look like a compact JWS (three dot-separated base64url segments, ≥ 100 chars) will be routed to the OAuth verifier and rejected. In practice, mcpgw-generated keys (mcpg_live_...) do not match this shape.


Hot reload

The OAuth verifier is held in an atomic.Pointer[oauth.Verifier]. On SIGHUP:

  1. mcpgw re-reads and validates mcpgw.yaml.
  2. If auth.oauth.enabled is true, a new Verifier is constructed — which primes the JWKS cache with one synchronous fetch.
  3. If the prime succeeds, SwapOAuthVerifier stores the new pointer atomically.
  4. If the prime fails (JWKS endpoint unreachable), the reload is aborted with slog.Error("oauth reload failed") and the existing verifier stays in place.

In-flight requests capture the verifier pointer at request entry and complete with the old verifier. New requests after the swap use the new one. There is no window where a request sees a half-loaded verifier.

The JWKS cache refresh goroutine is bound to the verifier’s context (the process’s run context on startup, or the reload invocation’s context on reload). When a verifier is replaced, its context is not cancelled — the goroutine stops naturally when the context that was passed to NewVerifier is done. On normal operation that is when the process shuts down.

public_url and metadata_path configure HTTP mux routes registered at startup. Mux routes cannot be changed at runtime; changes to these two fields require a full restart.


Failure modes operators should know

JWKS endpoint unreachable at startup

oauth.NewVerifier performs one synchronous JWKS fetch (“prime”) before returning. If this fails, main logs slog.Error("oauth setup") and exits with code 78. This is intentional fail-fast behaviour: a gateway that cannot verify tokens must not serve traffic. The discovery document cannot be served either, since the mux was never registered.

JWKS rotation

When the AS rotates keys, it publishes a new JWK with a new kid and (ideally) keeps the old key live for a grace period. mcpgw’s cache refreshes at jwks_cache_ttl intervals (default 5m). Tokens signed with the new key before the next cache refresh fail with invalid_token. The window is bounded by jwks_cache_ttl — operators with strict rotation requirements can set this to 1m or shorter, at the cost of more traffic to the JWKS endpoint.

Clock skew

leeway (default 30s) expands the exp and nbf acceptance windows in both directions. A 30s leeway means a token 30 seconds past exp is still accepted. Larger values improve tolerance for clock drift between AS and gateway; smaller values tighten expiry guarantees. 30s is a safe default for deployments where the AS and mcpgw share NTP synchronisation.

0.0.0.0 listen + no public_url

When listen is 0.0.0.0:PORT or :PORT and public_url is not set, publicResourceURL constructs the metadata resource field as http://0.0.0.0:PORT/mcp. mcpgw logs slog.Warn("oauth metadata advertises listen address") at startup. The gateway continues running, but MCP clients fetching the discovery document will see an unresolvable address. Production deployments must set auth.oauth.public_url.

Reload with unreachable JWKS

If a SIGHUP is received while the JWKS endpoint is down, the reload is skipped and the existing verifier stays in place. The operator sees slog.Error("oauth reload failed") in stderr/journald. This is the conservative choice: an unreachable IdP during reload should not take the gateway down.


What this is not

  • Not an Authorization Server. mcpgw never sees client secrets or issues tokens.
  • Not a token-introspection endpoint. RFC 7662 introspection is an AS-side feature; mcpgw validates signatures locally using the cached JWKS, not via a network call per request.
  • Not DPoP or mTLS-bound token support. Token binding (RFC 9449, RFC 8705) is not implemented in Phase 1.
  • Not OpenID Connect aware. mcpgw treats tokens as OAuth access tokens per RFC 9068. ID tokens (id_token) are not parsed or forwarded. The sub claim is extracted for Identity.Subject but not surfaced in audit or spans in Phase 1 (planned for Phase 2 CIMD work).