Core settings
max.poll.interval.mssized to processing time;max.poll.recordsto batch size.fetch.min.bytes/fetch.max.wait.msto trade latency vs throughput.enable.auto.commit=false; commit sync/async after processing batch.
Concurrency
- Prefer multiple consumer instances over massive
max.poll.records. - For CPU-bound steps, hand off to bounded executor; avoid blocking poll thread.
Ordering & retries
- Keep partition affinity when ordering matters; use DLT for poison messages.
- Backoff with jitter on retries; limit attempts per message.
Observability
- Metrics: lag per partition, commit latency, rebalances, processing time, error rates.
- Log offsets and partition for errors; trace batch sizes.
Checklist
- Poll loop never blocks; work delegated to bounded pool.
- Commits after successful processing; DLT in place.
- Lag and rebalance metrics monitored.