Metrics
- Use Micrometer + Prometheus:
management.endpoints.web.exposure.include=prometheus,health,info. - Add JVM+Tomcat/db pool meters; set percentiles for latencies.
- Create SLIs: request latency, error rate, saturation (threads/connections), GC pauses.
Traces
- Spring Boot 3 ships with OTel starter: add
spring-boot-starter-actuator+micrometer-tracing-bridge-otel+ exporter (OTLP/Zipkin/Jaeger). - Propagate headers (
traceparent); ensure async executors useContextPropagatingExecutor. - Sample smartly: lower rates on noisy paths; raise for errors.
Logs
- Use JSON layout; include traceId/spanId for correlation.
- Avoid verbose INFO in hot paths; keep payload size bounded.
Dashboards & alerts
- Latency/error SLO dashboards per endpoint.
- DB pool saturation, thread pool queue depth, GC pause, heap used %, 5xx rate.
- Alerts on SLO burn rates; include exemplars linking metrics → traces → logs.
Checklist
- Actuator endpoints secured and exposed only where needed.
- OTLP exporter configured; sampling tuned.
- Trace/log correlation verified in staging.
- Dashboards + alerts reviewed with oncall.