Gateway Middleware
Every request that passes through the gateway runs through a fixed middleware pipeline. Understanding the pipeline order helps you reason about policy evaluation timing, performance overhead, and how to tune each stage.
Pipeline order
Incoming request
│
▼
1. Rate limiter ← rejects if over rateLimit req/min
│
▼
2. PII detector ← scans request content for PII
│
▼
3. Injection scorer ← scores prompt injection risk (0–1)
│
▼
4. Policy evaluator ← evaluates request-phase policies
│ (block returns 400, redact modifies content)
▼
5. Provider proxy ← forwards to LLM provider
│
▼
6. Response PII ← scans response content for PII
│
▼
7. Policy evaluator ← evaluates response-phase policies
│
▼
8. Audit logger ← writes trace record
│
▼
Response to your app
Rate limiter
The rate limiter is always active. It tracks requests per IP address per minute:
gateway:
middleware:
rateLimit: 60 # 60 requests/min per IP (default)
Requests over the limit receive a 429 Too Many Requests response. Set rateLimit: 0 to disable rate limiting (not recommended in production).
PII detection
PII detection scans the request body for common personally identifiable information patterns:
gateway:
middleware:
pii: true # default
When PII is detected, the gateway:
- Tags the trace with
pii_detected: true - Evaluates any
pii_detectedpolicies you have configured (block, redact, or warn)
PII types detected: email addresses, phone numbers, credit card numbers, social security numbers, IP addresses, and passport numbers.
PII detection alone does not block or redact anything. You need a policy with conditionType: pii_detected and an appropriate action to take effect. Apply the foundational template pack for sensible defaults.
Injection scoring
The injection scorer assigns a risk score between 0 and 1 to each request, where 1 is highest risk:
gateway:
middleware:
injection: true # default
The score is computed using a pattern-matching classifier trained on common prompt injection techniques (role-play overrides, system prompt leaks, instruction injection).
The score is available to policies via conditionType: injection_score. The foundational policy pack blocks requests scoring ≥ 0.7.
Policy evaluator
Policies are evaluated in two phases:
Request phase — runs before the LLM call. A block action returns a 400 Bad Request to your application immediately; the LLM is never called. A redact action modifies the request content before forwarding.
Response phase — runs after the LLM responds. A redact action modifies the response content before it reaches your application.
Policy evaluation order within a phase follows creation order. A block action short-circuits subsequent policy checks.
Provider proxy
The proxy stage forwards the (possibly modified) request to the target provider and waits for a response. The provider is selected based on the routing rules described in Gateway Providers.
The timeoutMs field on each provider config controls how long the proxy waits before returning a 504 Gateway Timeout.
Audit logger
The audit logger writes a JSON record for every completed request:
{
"traceId": "trace_xyz789",
"timestamp": "2026-04-04T14:22:00.312Z",
"agentName": "contract-summarizer",
"provider": "openai",
"model": "gpt-4o",
"status": 200,
"durationMs": 312,
"inputTokens": 142,
"outputTokens": 87,
"costUsd": 0.0041,
"piiDetected": false,
"injectionScore": 0.02,
"policiesTriggered": []
}
Configure the output destination with the audit setting:
gateway:
middleware:
audit: stdout # stdout | file | off
Disabling middleware
You can disable individual middleware stages:
gateway:
middleware:
pii: false # No PII scanning
injection: false # No injection scoring
audit: off # No audit log
cache: false # No response caching
Disabling injection scoring and PII detection removes automatic data for policy evaluation. Policies that rely on injection_score or pii_detected conditions will never fire if the corresponding middleware is off.
Response caching
When cache: true, the gateway caches LLM responses for identical requests:
gateway:
middleware:
cache: true
Cache hits return the cached response immediately, skipping the provider proxy stage. This reduces latency and cost for repeated identical prompts. Cache keys are based on the full request body (model, messages, parameters).
Related
- Gateway Configuration — Full rivano.yaml reference
- Gateway Providers — How the proxy selects a provider
- SDK Policies — Create policies that fire at specific pipeline stages
- CLI Policies — Apply the foundational policy pack