Abstract message queue illustration

Kafka Reliability Playbook

Producers acks=all, min.insync.replicas>=2, idempotent producer on; enable transactions for exactly-once pipelines. Tune batch.size/linger.ms for throughput; cap max.in.flight.requests.per.connection for ordering. Handle retries with backoff; surface delivery errors. Consumers enable.auto.commit=false; commit after processing; DLT for poison messages. Size max.poll.interval.ms to work time; bound max.poll.records. Isolate heavy work in worker pool; keep poll loop fast. Topics & brokers RF ≥ 3; clean-up policy fit (delete vs compact); segment/retention sized to storage. Monitor ISR, under-replicated partitions, controller changes, request latency, disk usage. Throttle large produce/fetch; use quotas per client if needed. Checklist Producers idempotent, acks=all, MISR set. Consumers manual commit + DLT. RF/retention sized; ISR/URP monitored. Alerts on broker latency/disk/replication health.

March 5, 2025 · 4162 views

Tuning Kafka Consumers (Java)

Core settings max.poll.interval.ms sized to processing time; max.poll.records to batch size. fetch.min.bytes/fetch.max.wait.ms to trade latency vs throughput. enable.auto.commit=false; commit sync/async after processing batch. Concurrency Prefer multiple consumer instances over massive max.poll.records. For CPU-bound steps, hand off to bounded executor; avoid blocking poll thread. Ordering & retries Keep partition affinity when ordering matters; use DLT for poison messages. Backoff with jitter on retries; limit attempts per message. Observability Metrics: lag per partition, commit latency, rebalances, processing time, error rates. Log offsets and partition for errors; trace batch sizes. Checklist Poll loop never blocks; work delegated to bounded pool. Commits after successful processing; DLT in place. Lag and rebalance metrics monitored.

November 22, 2024 · 4864 views