Production hardening
z4j ships with backwards-compatible defaults that work out of the box but lean permissive. For production deployments, opt into the fail-closed mode of each subsystem below. None of these are required for z4j to function — they’re defense-in-depth for operators who want hardened defaults instead of trust-the-CA / trust-the-operator-config.
Scheduler gRPC — require explicit CN allow-list
Section titled “Scheduler gRPC — require explicit CN allow-list”When Z4J_SCHEDULER_GRPC_ENABLED=true, brain accepts
mTLS-authenticated gRPC connections from any client cert that the
configured CA bundle validates — the “trust the CA” deployment
model. For production, populate
Z4J_SCHEDULER_GRPC_ALLOWED_CNS with the explicit list of CNs
you’ve minted via z4j mint-scheduler-cert, AND set the require
flag so a misconfigured boot fails closed instead of falling back
to trust-the-CA:
Z4J_SCHEDULER_GRPC_ENABLED=trueZ4J_SCHEDULER_GRPC_ALLOWED_CNS='["scheduler-prod-1","scheduler-prod-2"]'Z4J_SCHEDULER_GRPC_REQUIRE_ALLOWLIST=trueSymmetric setting on the scheduler side for the brain -> scheduler TriggerSchedule channel:
Z4J_SCHEDULER_TRIGGER_GRPC_ENABLED=trueZ4J_SCHEDULER_TRIGGER_GRPC_ALLOWED_CNS='["brain-prod"]'Z4J_SCHEDULER_TRIGGER_GRPC_REQUIRE_ALLOWLIST=trueCN project bindings (multi-project deployments)
Section titled “CN project bindings (multi-project deployments)”If you run schedulers per-project (one scheduler instance per tenant, each with its own CN), bind each CN to its project list so a leaked cert can only act on the projects it was minted for:
Z4J_SCHEDULER_GRPC_CN_PROJECT_BINDINGS='{"scheduler-acme":["acme"],"scheduler-globex":["globex"]}'Without bindings (the default), every allow-listed CN can drive RPCs for any project. With bindings, requests outside the bound project list return PERMISSION_DENIED.
Notification webhooks — HTTPS-only by default
Section titled “Notification webhooks — HTTPS-only by default”z4j 1.4.0 defaults to HTTPS-only for generic webhook channels.
Operator-configured http:// URLs are rejected at config-time and
at dispatch-time to prevent payload + custom-header leakage in
transit.
If you have a legitimate internal-network http endpoint (intranet receiver, dev rig), opt back in:
Z4J_NOTIFICATIONS_WEBHOOK_ALLOW_HTTP=trueSlack / PagerDuty / Discord / Telegram channels always use the provider’s HTTPS endpoint and are unaffected by this setting.
Audit log retention
Section titled “Audit log retention”The audit log is HMAC-chained and append-only. By default the brain keeps every row forever. For storage management without losing chain integrity, use the audit retention sweeper:
Z4J_AUDIT_RETENTION_DAYS=365 # rolls daily; trims rows older than 365dThe sweeper preserves chain continuity by recording a “summary” row for each batch it deletes; verifying the chain across a sweep is documented in hmac-audit-chain.
Production environment flag
Section titled “Production environment flag”z4j detects “production” via two independent signals: a non-dev
Z4J_ENVIRONMENT value AND a non-empty Z4J_ALLOWED_HOSTS
config. Both together gate the strict-error / no-debug-endpoint
behavior. See
allowed-hosts for the four-layer
host header allow-list.
Recommended summary
Section titled “Recommended summary”For a production deployment, set:
# EnvironmentZ4J_ENVIRONMENT=productionZ4J_ALLOWED_HOSTS='["z4j.example.com"]'
# Scheduler gRPC (if you're running z4j-scheduler)Z4J_SCHEDULER_GRPC_REQUIRE_ALLOWLIST=trueZ4J_SCHEDULER_GRPC_ALLOWED_CNS='["..."]'Z4J_SCHEDULER_TRIGGER_GRPC_REQUIRE_ALLOWLIST=trueZ4J_SCHEDULER_TRIGGER_GRPC_ALLOWED_CNS='["..."]'
# Webhooks: HTTPS-only is the default; only set this if you need# plaintext for an internal endpoint.# Z4J_NOTIFICATIONS_WEBHOOK_ALLOW_HTTP=true
# Audit retentionZ4J_AUDIT_RETENTION_DAYS=365Each setting is documented individually in the env vars reference.