Hive Community

All posts

Deep, practical engineering stories — browse everything.

Laravel Queues with Horizon: Reliable Setup

Workers & balancing Define queue priorities; dedicate workers per queue (emails, webhooks, default). Use balance strategies (simple, auto) and cap max processes per supervisor. Reliability Set retry/backoff per job; push non-idempotent tasks carefully. Configure timeout and retry_after (keep retry_after > max job time). Use Redis with persistence; enable horizon:supervisors monitors. Observability Horizon dashboard: throughput, runtime, failures, retries. Alert on rising failures and long runtimes; log payload/context for failed jobs. Prune failed jobs with retention policy; send to DLQ when needed. Deployment Restart Horizon on deploy to pick up code; use horizon:terminate. Ensure supervisor/systemd restarts Horizon if it dies. Checklist Queues prioritized; supervisors sized. Retries/backoff and timeouts set; DLQ plan. Monitoring/alerts configured; failed job retention in place.

Read
Abstract security illustration

API Security Hardening Checklist

Identity & access Enforce strong auth (OIDC/JWT); short-lived tokens + refresh; audience/issuer checks. Fine-grained authz (RBAC/ABAC); deny-by-default; rate-limit per identity. Input & data Validate/normalize input; reject oversized bodies; JSON schema where possible. Output encode; avoid reflecting raw user data; paginate results. Store secrets in vault/KMS; rotate keys; never log secrets/tokens. Transport & headers TLS everywhere; HSTS; modern ciphers. Security headers: Content-Security-Policy, X-Content-Type-Options=nosniff, X-Frame-Options=DENY, Referrer-Policy. Abuse protection Rate limit + burst control; CAPTCHA/step-up for sensitive actions. Bot detection where relevant; geo/IP allow/deny for admin surfaces. Observability Structured audit logs with identity, action, resource, result; avoid PII spill. Alerts on auth failures, unusual rate spikes, and 5xx anomalies. Checklist AuthZ enforced; least privilege. Input validated; size limits set. TLS + security headers applied. Rate limits + abuse controls configured. Secrets vaulted and rotated.

Read

Spring Boot Observability: Metrics, Traces, Logs

Metrics Use Micrometer + Prometheus: management.endpoints.web.exposure.include=prometheus,health,info. Add JVM+Tomcat/db pool meters; set percentiles for latencies. Create SLIs: request latency, error rate, saturation (threads/connections), GC pauses. Traces Spring Boot 3 ships with OTel starter: add spring-boot-starter-actuator + micrometer-tracing-bridge-otel + exporter (OTLP/Zipkin/Jaeger). Propagate headers (traceparent); ensure async executors use ContextPropagatingExecutor. Sample smartly: lower rates on noisy paths; raise for errors. Logs Use JSON layout; include traceId/spanId for correlation. Avoid verbose INFO in hot paths; keep payload size bounded. Dashboards & alerts Latency/error SLO dashboards per endpoint. DB pool saturation, thread pool queue depth, GC pause, heap used %, 5xx rate. Alerts on SLO burn rates; include exemplars linking metrics → traces → logs. Checklist Actuator endpoints secured and exposed only where needed. OTLP exporter configured; sampling tuned. Trace/log correlation verified in staging. Dashboards + alerts reviewed with oncall.

Read
Incident response illustration

DevOps Incident Response Playbook

During incident Roles: incident commander, comms lead, ops/feature SMEs, scribe. Declare severity quickly; open shared channel/bridge; timestamp actions. Stabilize first: roll back, feature-flag off, scale up, or shed load. Runbooks & tooling Prebuilt runbooks per service: restart/rollback steps, dashboards, logs, feature flags. One-click access to dashboards (metrics, traces, logs), recent deploys, and toggles. Paging rules with escalation; avoid noisy alerts. Comms Single source of truth: incident doc; external status page if needed. Regular updates with impact, scope, mitigation, ETA. After incident Blameless postmortem; timeline, root causes, contributing factors. Action items with owners/deadlines; track to completion. Add tests/alerts/runbook updates; reduce time-to-detect and time-to-recover.

Read

Tuning Kafka Consumers (Java)

Core settings max.poll.interval.ms sized to processing time; max.poll.records to batch size. fetch.min.bytes/fetch.max.wait.ms to trade latency vs throughput. enable.auto.commit=false; commit sync/async after processing batch. Concurrency Prefer multiple consumer instances over massive max.poll.records. For CPU-bound steps, hand off to bounded executor; avoid blocking poll thread. Ordering & retries Keep partition affinity when ordering matters; use DLT for poison messages. Backoff with jitter on retries; limit attempts per message. Observability Metrics: lag per partition, commit latency, rebalances, processing time, error rates. Log offsets and partition for errors; trace batch sizes. Checklist Poll loop never blocks; work delegated to bounded pool. Commits after successful processing; DLT in place. Lag and rebalance metrics monitored.

Read

REST API Design Best Practices: Building Production-Ready APIs

Designing a REST API that is intuitive, maintainable, and scalable requires following established best practices. Here’s a comprehensive guide. 1. Use Nouns, Not Verbs Good GET /users GET /users/123 POST /users PUT /users/123 DELETE /users/123 Bad GET /getUsers GET /getUserById POST /createUser PUT /updateUser DELETE /deleteUser 2. Use Plural Nouns Good GET /users GET /orders GET /products Bad GET /user GET /order GET /product 3. Use HTTP Methods Correctly // GET - Retrieve resources GET /users // Get all users GET /users/123 // Get specific user // POST - Create new resources POST /users // Create new user // PUT - Update entire resource PUT /users/123 // Replace user // PATCH - Partial update PATCH /users/123 // Update specific fields // DELETE - Remove resources DELETE /users/123 // Delete user 4. Use Proper HTTP Status Codes // Success 200 OK // Successful GET, PUT, PATCH 201 Created // Successful POST 204 No Content // Successful DELETE // Client Errors 400 Bad Request // Invalid request 401 Unauthorized // Authentication required 403 Forbidden // Insufficient permissions 404 Not Found // Resource doesn't exist 409 Conflict // Resource conflict // Server Errors 500 Internal Server Error 503 Service Unavailable 5. Consistent Response Format { "data": { "id": 123, "name": "John Doe", "email": "[email protected]" }, "meta": { "timestamp": "2024-11-15T10:00:00Z" } } Error Response Format { "error": { "code": "VALIDATION_ERROR", "message": "Invalid input data", "details": [ { "field": "email", "message": "Invalid email format" } ] } } 6. Versioning URL Versioning /api/v1/users /api/v2/users Header Versioning Accept: application/vnd.api+json;version=1 7. Filtering, Sorting, Pagination GET /users?page=1&limit=20 GET /users?sort=name&order=asc GET /users?status=active&role=admin GET /users?search=john 8. Nested Resources GET /users/123/posts POST /users/123/posts GET /users/123/posts/456 PUT /users/123/posts/456 DELETE /users/123/posts/456 9. Use HTTPS Always use HTTPS in production to encrypt data in transit. ...

Read

Go Profiling in Production with pprof

Capturing safely Expose /debug/pprof behind auth/VPN; or run curl -sK -H "Authorization: Bearer ...". CPU profile: go tool pprof http://host/debug/pprof/profile?seconds=30. Heap profile: .../heap; Goroutines: .../goroutine?debug=2. For containers: kubectl port-forward then grab profiles; avoid prod CPU throttle when profiling. Reading CPU profiles Look at flat vs. cumulative time; identify hot functions. Flamegraph: go tool pprof -http=:8081 cpu.pprof. Check GC activity and syscalls; watch for mutex contention. Reading heap profiles Compare live allocations vs. in-use objects; watch large []byte and map growth. Look for leaks via rising heap over time; diff profiles between runs. Goroutine dumps Spot leaked goroutines (blocked on channel/lock/I/O). Common culprits: missing cancel, unbounded worker creation, stuck time.After. Best practices Add pprof only when needed in prod; default on in staging. Sample under load close to real traffic. Keep artifacts: store profiles with build SHA + timestamp; compare after releases. Combine with metrics (alloc rate, GC pauses, goroutines) to validate fixes.

Read

Circuit Breakers with Resilience4j

Core settings Sliding window (count/time), failure rate threshold, slow-call threshold, minimum calls. Wait duration in open state; half-open permitted calls; automatic transition. Patterns Wrap HTTP/DB/queue clients; combine with timeouts/retries/bulkheads. Tune per dependency; differentiate fast-fail vs. tolerant paths. Provide fallback only when safe/idempotent. Observability Export metrics: state changes, calls/success/failure/slow, not permitted count. Log state transitions; add exemplars linking to traces. Alert on frequent open/half-open oscillation. Checklist Per-downstream breaker with tailored thresholds. Timeouts and retries composed correctly (timeout → breaker → retry). Metrics/logs/traces wired; alerts on open rate.

Read

Database Optimization Techniques: Performance Tuning Guide

Database performance is critical for application scalability. Here are proven optimization techniques. 1. Indexing Strategy When to Index -- Index frequently queried columns CREATE INDEX idx_user_email ON users(email); -- Index foreign keys CREATE INDEX idx_post_user_id ON posts(user_id); -- Composite indexes for multi-column queries CREATE INDEX idx_user_status_role ON users(status, role); When NOT to Index Columns with low cardinality (few unique values) Frequently updated columns Small tables (< 1000 rows) 2. Query Optimization Avoid SELECT * -- Bad SELECT * FROM users WHERE id = 123; -- Good SELECT id, name, email FROM users WHERE id = 123; Use LIMIT -- Always limit large result sets SELECT * FROM posts ORDER BY created_at DESC LIMIT 20; Avoid N+1 Queries // Bad: N+1 queries users.forEach(user => { const posts = db.query('SELECT * FROM posts WHERE user_id = ?', [user.id]); }); // Good: Single query with JOIN const usersWithPosts = db.query(` SELECT u.*, p.* FROM users u LEFT JOIN posts p ON u.id = p.user_id `); 3. Connection Pooling // Configure connection pool const pool = mysql.createPool({ connectionLimit: 10, host: 'localhost', user: 'user', password: 'password', database: 'mydb', waitForConnections: true, queueLimit: 0 }); 4. Caching Application-Level Caching // Cache frequently accessed data const cache = new Map(); async function getUser(id) { if (cache.has(id)) { return cache.get(id); } const user = await db.query('SELECT * FROM users WHERE id = ?', [id]); cache.set(id, user); return user; } Query Result Caching -- Use query cache (MySQL) SET GLOBAL query_cache_size = 67108864; SET GLOBAL query_cache_type = 1; 5. Database Schema Optimization Normalize Properly -- Avoid over-normalization -- Balance between normalization and performance Use Appropriate Data Types -- Use smallest appropriate type TINYINT instead of INT for small numbers VARCHAR(255) instead of TEXT when possible DATE instead of DATETIME when time not needed 6. Partitioning -- Partition large tables by date CREATE TABLE logs ( id INT, created_at DATE, data TEXT ) PARTITION BY RANGE (YEAR(created_at)) ( PARTITION p2023 VALUES LESS THAN (2024), PARTITION p2024 VALUES LESS THAN (2025), PARTITION p2025 VALUES LESS THAN (2026) ); 7. Query Analysis EXPLAIN Plan EXPLAIN SELECT * FROM users WHERE email = '[email protected]'; Slow Query Log -- Enable slow query log SET GLOBAL slow_query_log = 'ON'; SET GLOBAL long_query_time = 1; 8. Batch Operations // Bad: Multiple individual inserts users.forEach(user => { db.query('INSERT INTO users (name, email) VALUES (?, ?)', [user.name, user.email]); }); // Good: Batch insert const values = users.map(u => [u.name, u.email]); db.query('INSERT INTO users (name, email) VALUES ?', [values]); 9. Database Maintenance Regular Vacuuming (PostgreSQL) VACUUM ANALYZE; Optimize Tables (MySQL) OPTIMIZE TABLE users; 10. Monitoring Monitor query performance Track slow queries Monitor connection pool usage Watch for table locks Monitor disk I/O Best Practices Index strategically Optimize queries Use connection pooling Implement caching Normalize appropriately Use appropriate data types Partition large tables Analyze query performance Batch operations Regular maintenance Conclusion Database optimization requires: ...

Read