Skip to content

Production hardening

z4j ships with backwards-compatible defaults that work out of the box but lean permissive. For production deployments, opt into the fail-closed mode of each subsystem below. None of these are required for z4j to function — they’re defense-in-depth for operators who want hardened defaults instead of trust-the-CA / trust-the-operator-config.

Scheduler gRPC — require explicit CN allow-list

Section titled “Scheduler gRPC — require explicit CN allow-list”

When Z4J_SCHEDULER_GRPC_ENABLED=true, brain accepts mTLS-authenticated gRPC connections from any client cert that the configured CA bundle validates — the “trust the CA” deployment model. For production, populate Z4J_SCHEDULER_GRPC_ALLOWED_CNS with the explicit list of CNs you’ve minted via z4j mint-scheduler-cert, AND set the require flag so a misconfigured boot fails closed instead of falling back to trust-the-CA:

Terminal window
Z4J_SCHEDULER_GRPC_ENABLED=true
Z4J_SCHEDULER_GRPC_ALLOWED_CNS='["scheduler-prod-1","scheduler-prod-2"]'
Z4J_SCHEDULER_GRPC_REQUIRE_ALLOWLIST=true

Symmetric setting on the scheduler side for the brain -> scheduler TriggerSchedule channel:

Terminal window
Z4J_SCHEDULER_TRIGGER_GRPC_ENABLED=true
Z4J_SCHEDULER_TRIGGER_GRPC_ALLOWED_CNS='["brain-prod"]'
Z4J_SCHEDULER_TRIGGER_GRPC_REQUIRE_ALLOWLIST=true

CN project bindings (multi-project deployments)

Section titled “CN project bindings (multi-project deployments)”

If you run schedulers per-project (one scheduler instance per tenant, each with its own CN), bind each CN to its project list so a leaked cert can only act on the projects it was minted for:

Terminal window
Z4J_SCHEDULER_GRPC_CN_PROJECT_BINDINGS='{"scheduler-acme":["acme"],"scheduler-globex":["globex"]}'

Without bindings (the default), every allow-listed CN can drive RPCs for any project. With bindings, requests outside the bound project list return PERMISSION_DENIED.

Notification webhooks — HTTPS-only by default

Section titled “Notification webhooks — HTTPS-only by default”

z4j 1.4.0 defaults to HTTPS-only for generic webhook channels. Operator-configured http:// URLs are rejected at config-time and at dispatch-time to prevent payload + custom-header leakage in transit.

If you have a legitimate internal-network http endpoint (intranet receiver, dev rig), opt back in:

Terminal window
Z4J_NOTIFICATIONS_WEBHOOK_ALLOW_HTTP=true

Slack / PagerDuty / Discord / Telegram channels always use the provider’s HTTPS endpoint and are unaffected by this setting.

The audit log is HMAC-chained and append-only. By default the brain keeps every row forever. For storage management without losing chain integrity, use the audit retention sweeper:

Terminal window
Z4J_AUDIT_RETENTION_DAYS=365 # rolls daily; trims rows older than 365d

The sweeper preserves chain continuity by recording a “summary” row for each batch it deletes; verifying the chain across a sweep is documented in hmac-audit-chain.

z4j detects “production” via two independent signals: a non-dev Z4J_ENVIRONMENT value AND a non-empty Z4J_ALLOWED_HOSTS config. Both together gate the strict-error / no-debug-endpoint behavior. See allowed-hosts for the four-layer host header allow-list.

For a production deployment, set:

Terminal window
# Environment
Z4J_ENVIRONMENT=production
Z4J_ALLOWED_HOSTS='["z4j.example.com"]'
# Scheduler gRPC (if you're running z4j-scheduler)
Z4J_SCHEDULER_GRPC_REQUIRE_ALLOWLIST=true
Z4J_SCHEDULER_GRPC_ALLOWED_CNS='["..."]'
Z4J_SCHEDULER_TRIGGER_GRPC_REQUIRE_ALLOWLIST=true
Z4J_SCHEDULER_TRIGGER_GRPC_ALLOWED_CNS='["..."]'
# Webhooks: HTTPS-only is the default; only set this if you need
# plaintext for an internal endpoint.
# Z4J_NOTIFICATIONS_WEBHOOK_ALLOW_HTTP=true
# Audit retention
Z4J_AUDIT_RETENTION_DAYS=365

Each setting is documented individually in the env vars reference.