Skip to content

Production Deployment

This guide covers everything you need to deploy OJS in a production environment.

Use CaseRecommended BackendWhy
Speed-critical, low latencyRedisSub-millisecond enqueue, mature ecosystem
ACID guarantees, SQL queryabilityPostgreSQLStrong durability, transactional enqueue
Cloud-native microservicesNATSSingle binary, built-in clustering
Event replay, complianceKafkaImmutable log, unlimited throughput
AWS-native, zero opsSQSFully managed, pay-per-use
Development / CILiteZero deps, sub-50ms startup

See the Backend Selection Guide for detailed comparison.

Terminal window
# Add the OJS Helm repository
helm repo add openjobspec https://openjobspec.github.io/charts
helm repo update
# Install with Redis backend
helm install ojs openjobspec/ojs-server \
--set backend=redis \
--set redis.url=redis://redis:6379 \
--set auth.apiKey=your-secret-key \
--set replicas=3
values.yaml
backend: redis
replicas: 3
redis:
url: redis://redis-cluster:6379
auth:
apiKey: "${OJS_API_KEY}" # Required in production
enabled: true
resources:
requests:
memory: "128Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "1000m"
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 10
targetCPUUtilization: 70
monitoring:
prometheus:
enabled: true
grafana:
dashboards: true

For simpler deployments:

Terminal window
cd ojs-cloud/deploy
cp .env.example .env # Edit with your secrets
docker compose -f docker-compose.production.yml up -d

All production deployments MUST enable API key authentication:

Terminal window
# Environment variable
OJS_AUTH_REQUIRED=true
OJS_API_KEY=your-strong-random-key-32-chars-minimum

Enable job payload encryption for sensitive data:

Terminal window
OJS_ENCRYPTION_ENABLED=true
OJS_ENCRYPTION_KEY=your-32-byte-aes-key-base64-encoded
  • Place OJS servers on a private network (not internet-facing)
  • Use a reverse proxy (Nginx, Caddy, ALB) for TLS termination
  • Enable rate limiting to prevent abuse
  • Set CORS headers if Admin UI is on a different domain

Define governance rules for job processing:

[
{
"id": "block-pii-queue",
"name": "Block PII on public queues",
"action": "deny",
"enabled": true,
"conditions": {
"queues": ["public-*"],
"tags": ["contains-pii"]
}
}
]

Every OJS backend exposes metrics at /metrics:

# Key metrics to monitor
ojs_jobs_enqueued_total # Total jobs enqueued
ojs_jobs_completed_total # Total jobs completed
ojs_jobs_failed_total # Total jobs failed
ojs_queue_depth # Current queue depth
ojs_job_duration_seconds # Job processing time histogram
ojs_worker_active_jobs # Currently active jobs per worker

Import the pre-built dashboards from deploy/grafana/:

  1. Overview — System-wide throughput, latency, error rate
  2. Queues — Per-queue depth, throughput, and age
  3. Workers — Worker count, utilization, and heartbeat status
  4. Jobs — Job lifecycle timing and state distribution
  5. Errors — Error rate by type, retry patterns, dead letter growth
  6. Performance — p50/p95/p99 latency, memory, CPU

Recommended alerts:

AlertConditionSeverity
Queue backlog growingDepth > 1000 for > 5 minWarning
High failure rate> 10% for > 2 minCritical
Worker stallNo heartbeat for > 60sCritical
Dead letter growth> 100 jobs in 1 hourWarning

Enable distributed tracing across producers and workers:

// Go SDK
worker.Use(ojs.OpenTelemetryMiddleware(ojs.OTelConfig{
ServiceName: "payment-worker",
Endpoint: "otel-collector:4317",
}))

Run 3+ OJS server replicas behind a load balancer. All backends support concurrent access from multiple server instances.

BackendHA Strategy
RedisRedis Sentinel or Redis Cluster
PostgreSQLStreaming replication + pgbouncer
NATSNATS Cluster (built-in)
KafkaMulti-broker cluster
SQSAWS-managed (multi-AZ by default)

OJS backends support graceful shutdown:

Terminal window
# Kubernetes terminationGracePeriodSeconds
terminationGracePeriodSeconds: 30
# Or manually
kill -SIGTERM <pid> # Starts graceful shutdown
# Active jobs complete, no new jobs are fetched
Terminal window
# Start with: concurrency = 2 × CPU cores
OJS_WORKER_CONCURRENCY=16
Terminal window
# High throughput: shorter interval
OJS_WORKER_POLL_INTERVAL=200ms
# Low throughput: longer to save CPU
OJS_WORKER_POLL_INTERVAL=2s

Enable the auto-tuning engine for automatic optimization:

Terminal window
OJS_AUTOTUNE=true
OJS_AUTOTUNE_INTERVAL=30s

The engine analyzes throughput, latency, and queue depth to recommend optimal concurrency, poll intervals, and connection pool sizes.

  • API key authentication enabled
  • TLS termination configured
  • Backend persistence configured (Redis AOF, Postgres WAL)
  • 3+ server replicas running
  • Health check endpoint monitored
  • Prometheus scraping enabled
  • Grafana dashboards imported
  • Alerting rules configured
  • Graceful shutdown tested
  • Backup strategy for backend data
  • Log aggregation configured
  • Rate limiting enabled
  • Network security reviewed