Workers & balancing

  • Define queue priorities; dedicate workers per queue (emails, webhooks, default).
  • Use balance strategies (simple, auto) and cap max processes per supervisor.

Reliability

  • Set retry/backoff per job; push non-idempotent tasks carefully.
  • Configure timeout and retry_after (keep retry_after > max job time).
  • Use Redis with persistence; enable horizon:supervisors monitors.

Observability

  • Horizon dashboard: throughput, runtime, failures, retries.
  • Alert on rising failures and long runtimes; log payload/context for failed jobs.
  • Prune failed jobs with retention policy; send to DLQ when needed.

Deployment

  • Restart Horizon on deploy to pick up code; use horizon:terminate.
  • Ensure supervisor/systemd restarts Horizon if it dies.

Checklist

  • Queues prioritized; supervisors sized.
  • Retries/backoff and timeouts set; DLQ plan.
  • Monitoring/alerts configured; failed job retention in place.