Enable Datadog tracing
Problem: you want every MCP call traced in Datadog APM with mcp.* attributes. Spans should show up in the same dashboards your other services use.
Solution: point mcpgw at your Datadog Agent’s OTLP/HTTP receiver. The Agent forwards spans to Datadog over its existing pipe — no separate ingestion path.
On the Datadog Agent — enable OTLP/HTTP
In datadog.yaml:
otlp_config:
receiver:
protocols:
http:
endpoint: 0.0.0.0:4318
Or, with environment variables (Helm chart, container):
DD_OTLP_CONFIG_RECEIVER_PROTOCOLS_HTTP_ENDPOINT=0.0.0.0:4318
Restart the Agent. Confirm the receiver is up:
curl -s -o /dev/null -w "%{http_code}\n" -X POST http://<agent>:4318/v1/traces
# Expect 415 (the receiver is up but rejects an empty body) — anything else means mis-wired
On mcpgw — point at the receiver
telemetry:
customer:
enabled: true
endpoint: http://datadog-agent:4318 # bare host:port — /v1/traces is appended
service_name: mcpgw # appears in Datadog APM service list
resource_attrs: # optional: arbitrary OTel resource attrs
env: production
version: v1.0.0
team: platform
Restart the gateway (telemetry endpoint is not hot-reloadable). Within ~10 seconds of the next request, spans appear in Datadog APM under service mcpgw.
What the spans look like
Every JSON-RPC call produces one span:
- Name:
mcp.tools.call(ormcp.tools.list,mcp.resources.read, etc.) - Kind:
SERVER(inbound) - Attributes:
mcp.method,mcp.tool.name,mcp.session.id,mcp.transport,mcp.upstream,mcp.policy.decision,mcp.payload.bytes_in/out
Outbound calls to the upstream are recorded as CLIENT child spans, so you see end-to-end latency: client → gateway → upstream → gateway → client.
See Reference: telemetry for the full attribute table and enum values.
Verifying
# Make a tool call
curl -s -X POST http://localhost:7332/mcp \
-H "Content-Type: application/json" \
-H "Mcp-Session-Id: dd-verify" \
-d '{"jsonrpc":"2.0","id":1,"method":"tools/list"}'
# Then in Datadog APM:
# - service:mcpgw operation:mcp.tools.list
# - filter @mcp.session.id:dd-verify
Span should appear within 10s.
Common dashboards to build
- Policy mix — pie chart of
@mcp.policy.decisionvalues. Surfaces the ratio of allow / deny / redact / rate_limit_blocked. - Top denied tools — bar chart of
@mcp.tool.namefiltered by@mcp.policy.decision:deny. Tells you which tool names you might be over-restricting. - Tool latency P95 —
mcp.tools.callp95 grouped by@mcp.tool.name. Surfaces slow upstreams. - Sessions per tool —
count(distinct @mcp.session.id) by @mcp.tool.name. Detects unusual fan-out. - Error kinds — bar chart of
@mcp.error.kind.upstream_timeoutandupstream_unreachableare the ones you actually want to alert on.
Pitfalls
endpointaccepts a barehost:port— mcpgw appends/v1/tracesautomatically. Override with the full URL only if you front the Agent with an OTLP-compatible proxy mounted at a non-default path.service_nameis also calledDD_SERVICEin Datadog terms. Use whatever name you want to see in APM. Consistency matters more than the value.- Spans fail closed. If the OTLP receiver is unreachable, the gateway logs a warning every minute and continues serving traffic. Telemetry export will never block a request.
- Resource attribute names must match the Datadog mapping.
env,version, andservice.nameare auto-recognized; arbitrary keys appear as@<key>in APM filters.
Troubleshooting
| Symptom | Cause | Fix |
|---|---|---|
| No spans in Datadog | OTLP receiver disabled | Enable otlp_config.receiver.protocols.http on the Agent |
| Spans appear under “unknown service” | service_name not set or empty | Set telemetry.customer.service_name |
Datadog shows operation name only, no mcp.* attrs | Service name correct but custom resource attrs missing | Datadog version may need to allow custom attrs in span tags — see Datadog Agent v7.50+ |
| 5xx in mcpgw logs about telemetry | Receiver up but Agent can’t reach Datadog | Check Agent’s own status with datadog-agent status |
Related
- Reference: telemetry — semantic conventions
- Tutorial 1 — end-to-end first trace