Deadlines & retries

  • Require client deadlines; enforce server-side context with grpc.DeadlineExceeded handling.
  • Configure retry/backoff on idempotent calls; avoid retry storms with jitter + max attempts.

Interceptors

  • Unary/stream interceptors for auth, metrics (Prometheus), logging, and panic recovery.
  • Use per-RPC circuit breakers and rate limits for critical dependencies.

TLS & auth

  • Enable TLS everywhere; prefer mTLS for internal services.
  • Rotate certs automatically; watch expiry metrics.
  • Add authz checks in interceptors; propagate identity via metadata.

Resource protection

  • Limit concurrent streams and max message sizes.
  • Bounded worker pools for handlers performing heavy work.
  • Tune keepalive to detect dead peers without flapping.

Observability

  • Metrics: latency, error codes, message sizes, active streams, retries.
  • Traces: annotate methods, peer info, attempt counts; sample smartly.
  • Logs: structured fields for method, code, duration, peer.

Checklist

  • Deadlines required; retries only for idempotent calls with backoff.
  • Interceptors for auth/metrics/logging/recovery.
  • TLS/mTLS enabled; cert rotation automated.
  • Concurrency and message limits set; keepalive tuned.