Why redact is regex-over-body
In mcpgw today, the redact[] action applies regex patterns to the raw JSON-RPC request body. The jsonpath field is parsed but rejected at startup. This page explains why we made that choice and what JSONPath scoping will look like when it lands in a future release.
The intuition
A redact rule should answer: “Find the secret in this request and replace it with [REDACTED].” There are two ways to specify “find the secret”:
- Regex over the body: “Match anything that looks like a Bearer token.” Works on the textual representation. Misses nothing that matches the pattern. Catches things that match the pattern but aren’t actually secrets (false positives).
- JSONPath: “Replace the value at
$.params.arguments.auth.” Works on the parsed JSON tree. Misses secrets at any other path (false negatives).
Both have failure modes. Regex gives false positives (over-redacts). JSONPath gives false negatives (misses secrets the schema doesn’t anticipate).
For v1, mcpgw deliberately picks the regex approach. The reasoning:
- MCP tool argument shapes are not stable enough yet. Tools come and go; argument schemas evolve. A JSONPath rule that worked yesterday may stop covering today’s new tool. A regex catches the secret regardless of where it lives.
- Operators write redact rules to defend against carelessness, not to enforce schema. The threat is “an agent or human accidentally pastes a credential into a tool argument.” The credential could land at any path. Regex catches all of them.
- Regex is simpler to write and audit. Operators already write regex daily for log filters, alert patterns, and CI checks. JSONPath is a separate skill set.
Why we reject the jsonpath field rather than ignore it
The temptation: accept jsonpath in the YAML, log a warning, and ignore it. This fails because the operator now believes their redaction is path-scoped when it isn’t. They write jsonpath: $.params.arguments.api_key, see no syntax error, and assume the gateway is doing what they asked. In production, secrets at other paths leak through unredacted because the field is silently discarded.
Failing closed at startup forces the operator to confront the limitation. They see “jsonpath is reserved for a future release, use a regex pattern” and rewrite the rule rather than running with silently degraded redaction.
What this means in practice
The implication for rule writing:
# v1 — regex over body
- id: redact-bearer
action: redact
when: { tool_name: "*" }
redact:
- regex: 'Bearer [A-Za-z0-9._-]+'
replacement: "[REDACTED]"
This rule matches Bearer ... no matter where in the body it appears: in params.arguments.auth, in params.arguments.headers.Authorization, in params.context.note, even in a tool name (extremely unlikely but possible). False positives are accepted as the cost of catching all real positives.
The most useful patterns to internalize:
- Anchor permissive regexes to identifiable prefixes.
[A-Za-z0-9]{32,}matches everything;sk-[A-Za-z0-9]{20,}matches OpenAI-shaped keys. - Prefer over-redaction to under-redaction. A redacted tool name is harmless; a leaked credential is not.
- Order regexes from most-specific to least-specific. A general pattern that consumes a substring earlier prevents a specific pattern from matching later.
What a future release will add
JSONPath scoping is planned as a complement to regex, not a replacement:
# Planned for a future release; not currently accepted
- id: redact-auth-arg
action: redact
when: { tool_name: "*" }
redact:
- jsonpath: "$.params.arguments.auth"
replacement: "[REDACTED]"
Path-scoped redaction is faster (no regex over the full body), more precise (no false positives), and easier to read. But it is also less robust to schema drift: if a tool’s auth argument moves from auth to Authorization, the rule misses it.
When that ships, the recommendation will be: use both. Path-scoped redaction for known-shape tools (your in-house MCP servers), regex-over-body as a backstop for unknown shapes (third-party tools).
Why not just ship JSONPath in v1?
Three reasons we deferred:
- The Go JSONPath ecosystem is fragmented. There are at least four implementations with subtle differences in
[*],.., and predicate semantics. Picking the wrong one in v1 would leave us with a compatibility commitment we didn’t want. - The semantics of redacting in a parsed tree are uglier than they look. “Replace this string node” is easy; “replace one element of this array” is medium; “replace the value of every key matching
*token*” is its own design problem. Building this incrementally is fine; building it under a “must ship in v1” deadline isn’t. - The regex-only path was already useful. Preliminary customer feedback showed that 80% of redact rules people wanted to write were “match Bearer / sk- / AKIA / common-PII patterns.” Regex covers those without needing JSONPath at all.
A word on encryption
Redact is not encryption. The audit log proves redaction occurred; it does not let you decrypt the original. If you need reversible redaction (e.g., for compliance audits where the auditor needs to see real values), mcpgw is the wrong tool. Use a separate vault-backed flow that encrypts secrets before they enter the request path, with a controlled detokenization at audit time.
mcpgw redact is a one-way trapdoor. The original payload is in the gateway’s memory for the duration of the request and is then forgotten. The audit log records the decision, never the secret.