MAP Docs

Observability

Three streams — logs, metrics, traces — plus the audit chain. W3C Trace Context throughout. OpenTelemetry-native.

MAP is observability-native. Every request emits four signals; every signal is correlatable. Logs, metrics, traces, and the audit chain share a correlation_id and propagate traceparent end-to-end.

The four signals

SignalEngine emitterTarget
Logstracing crate via tracing-subscriberstdout / Loki
Metricsmetrics crate via Prometheus exporter:9091/metrics
TracesOpenTelemetry → OTLP exporterTempo / Honeycomb / etc.
AuditAuditPipeline::recordMAX::audit_log_entryPostgreSQL chain

Logs

Structured JSON via the tracing crate. Every log line carries:

{
  "ts":        "2026-05-20T14:23:45.123Z",
  "level":     "INFO",
  "target":    "map_core::engine",
  "message":   "request dispatched",
  "correlation_id": "4f81b3a-...",
  "tenant_id": "org_acme",
  "protocol":  "MARC",
  "operation": "reasoning_task",
  "stage":     "router_invocation",
  "latency_ms": 1247,
  "traceparent": "00-..."
}

Configure level via RUST_LOG:

RUST_LOG=info,map_core=debug,hyper=warn

In production we run RUST_LOG=info and let trace level dynamic-control via /admin/trace (gated by map.admin.observability).

Metrics

Prometheus exposition at :9091/metrics. Cardinality is bounded by quantizing latency and labeling only by tenant, protocol, operation, outcome.

Core metrics

# Request counts (counter)
map_requests_total{tenant, protocol, operation, outcome}

# Latency histogram (histogram)
map_request_duration_seconds_bucket{tenant, protocol, operation, outcome, le}
map_request_duration_seconds_sum{...}
map_request_duration_seconds_count{...}

# Pipeline stage timing (histogram per stage)
map_stage_duration_seconds_bucket{stage, le}

# Refusal counters
map_refusals_total{stage, reason}
map_rate_limited_total{tenant, protocol}
map_capability_denied_total{tenant, protocol, operation}
map_circuit_open_total{protocol}

# MEAL dimensions (counter)
map_tokens_consumed_total{tenant, protocol, dimension}
map_seconds_consumed_total{tenant, protocol}
map_watts_consumed_total{tenant, protocol}

# Audit pipeline
map_audit_writes_total{event_type, outcome}
map_audit_buffer_bytes
map_audit_chain_head_age_seconds

# Plugin lifecycle
map_plugins_registered
map_plugins_pending
map_plugins_disabled
map_plugins_error

Pre-built dashboards

The repo ships Grafana dashboards in ops/grafana/:

  • map-overview.json — request rate, latency, refusal rate by protocol
  • map-pipeline.json — per-stage latency breakdown
  • map-tenants.json — top tenants by consumption, runway, refusals
  • map-economics.jsonMEAL dimensions per tenant, MADE settlement volume
  • map-audit.json — chain head, write rate, buffer health
  • map-plugins.json — plugin states, load/unload events

Traces

OpenTelemetry-native. Each request is a root span. Each pipeline stage is a child span. Each downstream protocol invocation (when one protocol calls another) is a nested span.

[handle_request]                         ↑ duration: 1247ms
├ [stage: version_resolution]            < 1ms
├ [stage: context_enrichment]            < 1ms
├ [stage: rate_limiting]                 < 1ms
├ [stage: security_gating]               < 1ms
├ [stage: circuit_breaking]              < 1ms
├ [stage: load_balancing]                < 1ms
├ [stage: router_invocation]             1240ms
│ └ [protocol: MARC, op: reasoning_task] 1240ms
│   ├ [protocol: MIND, op: query_knowledge] 24ms
│   ├ [protocol: MAVEN, op: attest]      128ms
│   └ [llm_call: gpt-4]                  980ms
└ [stage: result_handling]               2ms

Configure the OTLP exporter:

MAP_OTEL_ENDPOINT=http://otel-collector.observability:4317

The OTLP exporter is built into the engine via opentelemetry-otlp. No sidecar required.

Audit chain

The fourth signal. Every decision — allow, refuse, or error — produces a hash-chained record via MAX. See Concepts → Audit and MAX protocol.

Audit is structurally distinct from the other three signals:

PropertyLogs/Metrics/TracesAudit
RetentionHours-to-days typicalYears (legal hold typical)
IntegrityBest-effortHash-chained, signed
ReplayableNoYes (full replay)
Tamper-evidentNoYes (chain head verification)

You can lose all your logs and still reconstruct the institution's complete history from the audit chain. The audit chain is the constitutional record.

Correlation across signals

Every signal carries the same correlation_id. Recommended ergonomic chain:

  1. Alert fires from a Grafana metric (e.g. map_capability_denied_total > 100/min)
  2. Click through to a Loki log query filtered by tenant + outcome=refused
  3. Pull the correlation_id from a representative log line
  4. Open the trace in Tempo (one click in Grafana)
  5. Click through to MAX::traceability_graph for the audit chain of the same request

The audit-chain head hash is in every response and every log; you can verify chain integrity independently.

OTel semantic conventions

MAP follows OTel semantic conventions plus MAP-specific attributes:

service.name      = "map-engine"
service.version   = "0.6.x"

# Per-span
map.protocol      = "MARC"
map.operation     = "reasoning_task"
map.tenant_id     = "org_acme"
map.outcome       = "success" | "refused" | "error"
map.refusal_reason = "..."  # only on refusals
map.audit_head    = "0x..."  # the chain head produced

Sampling

The engine implements head-based sampling at the gateway:

MAP_TRACE_SAMPLE_RATE=0.01   # 1% of requests
MAP_TRACE_ALWAYS_ON=["map.macs.*", "map.moot.*"]   # 100% for governance ops
MAP_TRACE_ALWAYS_ERROR=true  # 100% of errors regardless of sample rate

Refusals and errors are always sampled. State-changing operations on governance protocols are always sampled. Read-heavy protocols default to 1%.

Reading metrics inside a protocol module

use metrics::{counter, histogram};

async fn invoke(&self, op: &str, payload: Value, ctx: &InvokeContext)
    -> Result<Response, ProtocolError>
{
    let start = std::time::Instant::now();
    let result = /* ... */;
    histogram!("map_protocol_internal_duration_seconds",
        "protocol" => self.protocol_name(),
        "operation" => op
    ).record(start.elapsed().as_secs_f64());
    counter!("map_protocol_internal_op_total",
        "protocol" => self.protocol_name(),
        "operation" => op,
        "outcome" => if result.is_ok() { "ok" } else { "err" }
    ).increment(1);
    result
}

Most protocols rely on the engine's automatic instrumentation at Stage 8. Custom metrics are useful for protocols with multi-step internal flows (e.g., MACE records per-delegate timing during deliberation).

See also

On this page