Workers & balancing
- Define queue priorities; dedicate workers per queue (emails, webhooks, default).
- Use
balancestrategies (simple,auto) and cap max processes per supervisor.
Reliability
- Set retry/backoff per job; push non-idempotent tasks carefully.
- Configure
timeoutandretry_after(keep retry_after > max job time). - Use Redis with persistence; enable
horizon:supervisorsmonitors.
Observability
- Horizon dashboard: throughput, runtime, failures, retries.
- Alert on rising failures and long runtimes; log payload/context for failed jobs.
- Prune failed jobs with retention policy; send to DLQ when needed.
Deployment
- Restart Horizon on deploy to pick up code; use
horizon:terminate. - Ensure supervisor/systemd restarts Horizon if it dies.
Checklist
- Queues prioritized; supervisors sized.
- Retries/backoff and timeouts set; DLQ plan.
- Monitoring/alerts configured; failed job retention in place.