Observability
Three streams — logs, metrics, traces — plus the audit chain. W3C Trace Context throughout. OpenTelemetry-native.
MAP is observability-native. Every request emits four signals; every signal is correlatable. Logs, metrics, traces, and the audit chain share a correlation_id and propagate traceparent end-to-end.
The four signals
| Signal | Engine emitter | Target |
|---|---|---|
| Logs | tracing crate via tracing-subscriber | stdout / Loki |
| Metrics | metrics crate via Prometheus exporter | :9091/metrics |
| Traces | OpenTelemetry → OTLP exporter | Tempo / Honeycomb / etc. |
| Audit | AuditPipeline::record → MAX::audit_log_entry | PostgreSQL chain |
Logs
Structured JSON via the tracing crate. Every log line carries:
{
"ts": "2026-05-20T14:23:45.123Z",
"level": "INFO",
"target": "map_core::engine",
"message": "request dispatched",
"correlation_id": "4f81b3a-...",
"tenant_id": "org_acme",
"protocol": "MARC",
"operation": "reasoning_task",
"stage": "router_invocation",
"latency_ms": 1247,
"traceparent": "00-..."
}Configure level via RUST_LOG:
RUST_LOG=info,map_core=debug,hyper=warnIn production we run RUST_LOG=info and let trace level dynamic-control via /admin/trace (gated by map.admin.observability).
Metrics
Prometheus exposition at :9091/metrics. Cardinality is bounded by quantizing latency and labeling only by tenant, protocol, operation, outcome.
Core metrics
# Request counts (counter)
map_requests_total{tenant, protocol, operation, outcome}
# Latency histogram (histogram)
map_request_duration_seconds_bucket{tenant, protocol, operation, outcome, le}
map_request_duration_seconds_sum{...}
map_request_duration_seconds_count{...}
# Pipeline stage timing (histogram per stage)
map_stage_duration_seconds_bucket{stage, le}
# Refusal counters
map_refusals_total{stage, reason}
map_rate_limited_total{tenant, protocol}
map_capability_denied_total{tenant, protocol, operation}
map_circuit_open_total{protocol}
# MEAL dimensions (counter)
map_tokens_consumed_total{tenant, protocol, dimension}
map_seconds_consumed_total{tenant, protocol}
map_watts_consumed_total{tenant, protocol}
# Audit pipeline
map_audit_writes_total{event_type, outcome}
map_audit_buffer_bytes
map_audit_chain_head_age_seconds
# Plugin lifecycle
map_plugins_registered
map_plugins_pending
map_plugins_disabled
map_plugins_errorPre-built dashboards
The repo ships Grafana dashboards in ops/grafana/:
map-overview.json— request rate, latency, refusal rate by protocolmap-pipeline.json— per-stage latency breakdownmap-tenants.json— top tenants by consumption, runway, refusalsmap-economics.json—MEALdimensions per tenant,MADEsettlement volumemap-audit.json— chain head, write rate, buffer healthmap-plugins.json— plugin states, load/unload events
Traces
OpenTelemetry-native. Each request is a root span. Each pipeline stage is a child span. Each downstream protocol invocation (when one protocol calls another) is a nested span.
[handle_request] ↑ duration: 1247ms
├ [stage: version_resolution] < 1ms
├ [stage: context_enrichment] < 1ms
├ [stage: rate_limiting] < 1ms
├ [stage: security_gating] < 1ms
├ [stage: circuit_breaking] < 1ms
├ [stage: load_balancing] < 1ms
├ [stage: router_invocation] 1240ms
│ └ [protocol: MARC, op: reasoning_task] 1240ms
│ ├ [protocol: MIND, op: query_knowledge] 24ms
│ ├ [protocol: MAVEN, op: attest] 128ms
│ └ [llm_call: gpt-4] 980ms
└ [stage: result_handling] 2msConfigure the OTLP exporter:
MAP_OTEL_ENDPOINT=http://otel-collector.observability:4317The OTLP exporter is built into the engine via opentelemetry-otlp. No sidecar required.
Audit chain
The fourth signal. Every decision — allow, refuse, or error — produces a hash-chained record via MAX. See Concepts → Audit and MAX protocol.
Audit is structurally distinct from the other three signals:
| Property | Logs/Metrics/Traces | Audit |
|---|---|---|
| Retention | Hours-to-days typical | Years (legal hold typical) |
| Integrity | Best-effort | Hash-chained, signed |
| Replayable | No | Yes (full replay) |
| Tamper-evident | No | Yes (chain head verification) |
You can lose all your logs and still reconstruct the institution's complete history from the audit chain. The audit chain is the constitutional record.
Correlation across signals
Every signal carries the same correlation_id. Recommended ergonomic chain:
- Alert fires from a Grafana metric (e.g.
map_capability_denied_total > 100/min) - Click through to a Loki log query filtered by
tenant + outcome=refused - Pull the
correlation_idfrom a representative log line - Open the trace in Tempo (one click in Grafana)
- Click through to
MAX::traceability_graphfor the audit chain of the same request
The audit-chain head hash is in every response and every log; you can verify chain integrity independently.
OTel semantic conventions
MAP follows OTel semantic conventions plus MAP-specific attributes:
service.name = "map-engine"
service.version = "0.6.x"
# Per-span
map.protocol = "MARC"
map.operation = "reasoning_task"
map.tenant_id = "org_acme"
map.outcome = "success" | "refused" | "error"
map.refusal_reason = "..." # only on refusals
map.audit_head = "0x..." # the chain head producedSampling
The engine implements head-based sampling at the gateway:
MAP_TRACE_SAMPLE_RATE=0.01 # 1% of requests
MAP_TRACE_ALWAYS_ON=["map.macs.*", "map.moot.*"] # 100% for governance ops
MAP_TRACE_ALWAYS_ERROR=true # 100% of errors regardless of sample rateRefusals and errors are always sampled. State-changing operations on governance protocols are always sampled. Read-heavy protocols default to 1%.
Reading metrics inside a protocol module
use metrics::{counter, histogram};
async fn invoke(&self, op: &str, payload: Value, ctx: &InvokeContext)
-> Result<Response, ProtocolError>
{
let start = std::time::Instant::now();
let result = /* ... */;
histogram!("map_protocol_internal_duration_seconds",
"protocol" => self.protocol_name(),
"operation" => op
).record(start.elapsed().as_secs_f64());
counter!("map_protocol_internal_op_total",
"protocol" => self.protocol_name(),
"operation" => op,
"outcome" => if result.is_ok() { "ok" } else { "err" }
).increment(1);
result
}Most protocols rely on the engine's automatic instrumentation at Stage 8. Custom metrics are useful for protocols with multi-step internal flows (e.g., MACE records per-delegate timing during deliberation).
See also
- Deployment — running the stack
- Engine audit pipeline
MAXprotocolMOTETprotocolMOMENTprotocol