Injection Detection
Rivano scores every inbound request for prompt injection risk before forwarding it to an LLM provider. A score of 0.0 means no injection patterns detected; 1.0 means high-confidence injection attempt. Requests above the configured threshold are blocked with a 403 — no tokens are consumed.
How scoring works
The injection scorer analyzes the full messages array using a pattern library that covers:
- Role confusion — Attempts to override the system prompt or claim a privileged identity (
"Ignore previous instructions...","You are now...") - Delimiter injection — Crafted sequences that exploit how prompts are assembled (
</s>,[INST],###,---) - Data exfiltration patterns — Instructions to output secrets, credentials, or internal system state
- Indirect injection — Payloads embedded in retrieved documents or tool call results that attempt to redirect agent behavior
- Jailbreak patterns — Well-known jailbreak templates including DAN, AIM, and similar variants
Each pattern match contributes to the score. The final score is a weighted composite normalized to [0.0, 1.0].
Score interpretation
| Score range | Risk level | Default behavior |
|---|---|---|
| 0.0 – 0.3 | Low | Pass through |
| 0.3 – 0.6 | Medium | Pass through (warn header added) |
| 0.6 – 0.7 | Elevated | Pass through (warn header added) |
| 0.7 – 1.0 | High | Blocked (default threshold) |
The score for every request is recorded in the trace. You can view score distribution in the Observability tab.
Threshold tuning
The default block threshold is 0.7. You can tune this per-tenant by creating a policy with an injection_score condition:
# rivano.yaml — raise the block threshold for a low-sensitivity agent
policies:
- name: injection-block
phase: request
condition:
type: injection_score
threshold: 0.85
action: block
# Lower threshold for a high-security agent
- name: injection-block-strict
phase: request
condition:
type: injection_score
threshold: 0.5
action: block
Start with the default threshold and observe the score distribution for your traffic in Observability → Traces. Filter by injection_score > 0 to see scored requests. Adjust the threshold once you understand your baseline false-positive rate.
Policy integration
Create injection policies via the SDK:
import Rivano from '@rivano/sdk';
const rivano = new Rivano({ apiKey: process.env.RIVANO_API_KEY! });
// Block high-confidence injection attempts
await rivano.policies.create({
name: 'injection-block',
phase: 'request',
condition: {
type: 'injection_score',
threshold: 0.7,
},
action: 'block',
enabled: true,
});
// Warn (add header) on medium-confidence attempts without blocking
await rivano.policies.create({
name: 'injection-warn',
phase: 'request',
condition: {
type: 'injection_score',
threshold: 0.4,
},
action: 'warn',
enabled: true,
}); Warn action behavior
When a policy action is warn, Rivano adds the header X-Rivano-Warning: injection_score_elevated to the response. The request still reaches the LLM provider. Your application can inspect this header and log or alert accordingly.
Blocked request response
When a request is blocked by injection detection, Rivano returns:
HTTP/1.1 403 Forbidden
Content-Type: application/json
X-Rivano-Policy: injection-block
{
"error": "Request blocked by policy",
"policy": "injection-block",
"details": {
"injection_score": 0.84,
"threshold": 0.7
}
}
The score and threshold are included in the response body so your application can surface a meaningful error to the user.
Related
- PII Detection — Detecting and redacting sensitive data
- Policies — Full condition and action reference
- Security Overview — Full pipeline architecture
- Traces API — Query injection scores programmatically