Enable Datadog tracing

Problem: you want every MCP call traced in Datadog APM with mcp.* attributes. Spans should show up in the same dashboards your other services use.

Solution: point mcpgw at your Datadog Agent’s OTLP/HTTP receiver. The Agent forwards spans to Datadog over its existing pipe — no separate ingestion path.

On the Datadog Agent — enable OTLP/HTTP

In datadog.yaml:

otlp_config:
  receiver:
    protocols:
      http:
        endpoint: 0.0.0.0:4318

Or, with environment variables (Helm chart, container):

DD_OTLP_CONFIG_RECEIVER_PROTOCOLS_HTTP_ENDPOINT=0.0.0.0:4318

Restart the Agent. Confirm the receiver is up:

curl -s -o /dev/null -w "%{http_code}\n" -X POST http://<agent>:4318/v1/traces
# Expect 415 (the receiver is up but rejects an empty body) — anything else means mis-wired

On mcpgw — point at the receiver

telemetry:
  customer:
    enabled: true
    endpoint: http://datadog-agent:4318      # bare host:port — /v1/traces is appended
    service_name: mcpgw                       # appears in Datadog APM service list
    resource_attrs:                           # optional: arbitrary OTel resource attrs
      env: production
      version: v1.0.0
      team: platform

Restart the gateway (telemetry endpoint is not hot-reloadable). Within ~10 seconds of the next request, spans appear in Datadog APM under service mcpgw.

What the spans look like

Every JSON-RPC call produces one span:

  • Name: mcp.tools.call (or mcp.tools.list, mcp.resources.read, etc.)
  • Kind: SERVER (inbound)
  • Attributes: mcp.method, mcp.tool.name, mcp.session.id, mcp.transport, mcp.upstream, mcp.policy.decision, mcp.payload.bytes_in/out

Outbound calls to the upstream are recorded as CLIENT child spans, so you see end-to-end latency: client → gateway → upstream → gateway → client.

See Reference: telemetry for the full attribute table and enum values.

Verifying

# Make a tool call
curl -s -X POST http://localhost:7332/mcp \
  -H "Content-Type: application/json" \
  -H "Mcp-Session-Id: dd-verify" \
  -d '{"jsonrpc":"2.0","id":1,"method":"tools/list"}'

# Then in Datadog APM:
# - service:mcpgw operation:mcp.tools.list
# - filter @mcp.session.id:dd-verify

Span should appear within 10s.

Common dashboards to build

  • Policy mix — pie chart of @mcp.policy.decision values. Surfaces the ratio of allow / deny / redact / rate_limit_blocked.
  • Top denied tools — bar chart of @mcp.tool.name filtered by @mcp.policy.decision:deny. Tells you which tool names you might be over-restricting.
  • Tool latency P95mcp.tools.call p95 grouped by @mcp.tool.name. Surfaces slow upstreams.
  • Sessions per toolcount(distinct @mcp.session.id) by @mcp.tool.name. Detects unusual fan-out.
  • Error kinds — bar chart of @mcp.error.kind. upstream_timeout and upstream_unreachable are the ones you actually want to alert on.

Pitfalls

  • endpoint accepts a bare host:port — mcpgw appends /v1/traces automatically. Override with the full URL only if you front the Agent with an OTLP-compatible proxy mounted at a non-default path.
  • service_name is also called DD_SERVICE in Datadog terms. Use whatever name you want to see in APM. Consistency matters more than the value.
  • Spans fail closed. If the OTLP receiver is unreachable, the gateway logs a warning every minute and continues serving traffic. Telemetry export will never block a request.
  • Resource attribute names must match the Datadog mapping. env, version, and service.name are auto-recognized; arbitrary keys appear as @<key> in APM filters.

Troubleshooting

SymptomCauseFix
No spans in DatadogOTLP receiver disabledEnable otlp_config.receiver.protocols.http on the Agent
Spans appear under “unknown service”service_name not set or emptySet telemetry.customer.service_name
Datadog shows operation name only, no mcp.* attrsService name correct but custom resource attrs missingDatadog version may need to allow custom attrs in span tags — see Datadog Agent v7.50+
5xx in mcpgw logs about telemetryReceiver up but Agent can’t reach DatadogCheck Agent’s own status with datadog-agent status