z4j-scheduler - the dynamic scheduler service

z4j-scheduler is a separate companion process that fires schedules against any z4j-supported engine (Celery, RQ, Dramatiq, Huey, arq, taskiq). Unlike the per-engine scheduler adapters in this section (which surface celery-beat / rq-scheduler / APScheduler / etc. in the dashboard), z4j-scheduler IS the scheduler - operators who choose it delete celery-beat from their stack.

What it is

A single-binary, leader-elected, gRPC-connected scheduler service. Three responsibilities:

Tick - once per second, scan z4j’s schedules table for rows due to fire.
Dispatch - for each due fire, send a FireSchedule gRPC frame to z4j. Brain dispatches a schedule.fire command to the project’s online agent.
Reconcile - every 15 minutes (configurable), do a full sync from brain so a missed delete event in the watch stream doesn’t leave a stale schedule firing forever.

Idle CPU is essentially zero. Memory is bounded by the number of schedules.

How it differs from the per-engine scheduler adapters

The packages on the schedulers overview page (z4j-celerybeat, z4j-rqscheduler, etc.) are adapters - they surface an EXISTING native scheduler (celery-beat, rq-scheduler) in the z4j dashboard so operators can see and edit those schedules in one UI.

z4j-scheduler is a replacement - it owns the schedule storage itself, in z4j’s database, and dispatches across any engine. Operators who run z4j-scheduler don’t need celery-beat or rq-scheduler at all.

Comparison matrix

The marketing version of this matrix lives at z4j.com/schedulers/z4j-scheduler broken into six categorized tables. The terse engineering version follows.

Engine + framework reach

Capability	celery-beat	django-celery-beat	rq-scheduler	APScheduler 4	system cron	z4j-scheduler
Engines supported	Celery only	Celery only	RQ only	APScheduler in-process	Shell exec	All 6: Celery, RQ, Dramatiq, Huey, arq, taskiq
Framework agnostic	Yes	Django only	Yes	Yes	Yes (host OS)	Yes - Django/Flask/FastAPI/bare
Multiple engines, one process	No	No	No	No	No	Yes

Operations & observability

Capability	celery-beat	django-celery-beat	rq-scheduler	APScheduler 4	system cron	z4j-scheduler
Edit live (no restart)	No	Django admin only	No	Persistent jobstore only	No	Dashboard / declarative / REST
Built-in dashboard	No	Django admin (basic)	No	No	No	Yes - fire history, run-now, edit
Fire history per schedule	No	No	No	No	syslog only	Buffered + acked + searchable
Audit log of edits	No	Django auditlog (3rd party)	No	No	No	HMAC-chained, tamper-evident
Manual fire-now button	No	No	No	API only	No	Yes - dashboard + REST
RBAC / project scoping	No	Django auth only	No	No	UNIX permissions	Project + role-scoped

Reliability & catch-up

Capability	celery-beat	django-celery-beat	rq-scheduler	APScheduler 4	system cron	z4j-scheduler
HA / leader election	Single only	Single only	Single only	Pluggable (manual)	Single host	Postgres advisory-lock leader, rolling-restart safe
Catch-up on outage	All-or-nothing	All-or-nothing	Default fire-all	Coalescing only	No	Per-schedule: skip / fire-one-missed / fire-all-missed
DST / IANA tz correctness	Yes	Yes	Partial	Yes	Yes	Yes (validated at API)
Solar (sunrise/sunset)	No	Yes	No	No	No	Yes
Replay past fires	No	No	No	No	No	Yes

Migration & lock-in

Capability	celery-beat	django-celery-beat	rq-scheduler	APScheduler 4	system cron	z4j-scheduler
Importer FROM other schedulers	N/A	N/A	N/A	N/A	N/A	All 6 native schedulers + crontab
Exporter TO other schedulers	N/A	N/A	N/A	N/A	N/A	All 6 - no lock-in (round-trip pinned by tests)
Coexist with native scheduler	-	-	-	-	-	Yes - z4j-celerybeat coexistence adapter
Declarative-in-source	Yes (beat_schedule)	No (DB-only)	Manual	Yes	crontab file	Yes - z4j_scheduler.declarative reconciler

Security

Capability	celery-beat	django-celery-beat	rq-scheduler	APScheduler 4	system cron	z4j-scheduler
Wire HMAC + replay protection	Broker-dependent	Broker-dependent	No	No	N/A	HMAC-SHA256 + per-session seq+nonce binding
Tamper-evident audit chain	No	No	No	No	No	Per-row HMAC, prev-hmac chain, DB UNIQUE
Zero-downtime secret rotation	N/A	N/A	N/A	N/A	N/A	Yes - Z4J_PREVIOUS_SECRETS multi-key window

License + project shape

Capability	celery-beat	django-celery-beat	rq-scheduler	APScheduler 4	system cron	z4j-scheduler
License	BSD-3	BSD-3	MIT	MIT	BSD	Apache-2.0
Process model	Long-running daemon	Daemon + DB	Long-running daemon	In-process or daemon	init / systemd	Standalone OR brain-embedded subprocess
Idle CPU footprint	Low	Low	Low	Low	Negligible	Negligible (1s tick, semaphore-bounded)

Architecture

┌─────────────────────────────────────────────┐
│  Your app (Django/Flask/FastAPI/bare)       │
│  + Celery (or RQ / arq / Huey / etc.)       │
│  + z4j-X engine adapter                     │
│  + z4j-bare agent runtime                   │
└──────────────────┬──────────────────────────┘
                   │ WebSocket / longpoll
                   ▼
┌─────────────────────────────────────────────┐
│  z4j      ← server + dashboard         │
│  z4j-scheduler  ← fires schedules at the    │
│                   right time (the new bit)  │
└─────────────────────────────────────────────┘

z4j-scheduler does not run tasks. It only decides WHEN they should run and tells brain to dispatch them. Your existing engine worker (celery, RQ worker, etc.) runs the tasks on its own broker. This means:

No new broker. Your existing Redis / RabbitMQ / Postgres / etc. stays as the message bus.
No new worker process. Your existing celery worker / rq worker / arq worker continues running tasks unchanged.
The only new thing is one scheduler process (or a few, for HA).

Deploy modes

Embedded mode - single container

For homelab / single-instance deploys, z4j can spawn z4j-scheduler as a supervised subprocess in its own lifespan. Auto-mints loopback mTLS at boot.

Z4J_EMBEDDED_SCHEDULER=true
Z4J_SCHEDULER_GRPC_ENABLED=true

That’s it - z4j spawns z4j-scheduler at startup and supervises it (bounded auto-restart, graceful SIGTERM). One container, no extra ops surface.

Standalone mode - production / HA

For production, run z4j-scheduler as a separate process or container. Multiple instances elect a leader via Postgres advisory lock; only the leader ticks. Followers stay warm.

pip install z4j-scheduler

z4j-scheduler serve \
  --brain-grpc-url brain.internal:7701 \
  --brain-rest-url https://brain.internal \
  --tls-cert /etc/z4j/scheduler.crt \
  --tls-key /etc/z4j/scheduler.key \
  --tls-ca /etc/z4j/ca.crt

z4j side enables the gRPC server with:

Z4J_SCHEDULER_GRPC_ENABLED=true
Z4J_SCHEDULER_GRPC_ALLOWED_CNS=["scheduler-prod","scheduler-staging"]

Mutual TLS is required: z4j’s gRPC server presents its server cert; the scheduler presents a client cert whose CN must be in the allow-list. There is no plaintext fallback.

Schedule sources

You can put schedules into z4j-scheduler’s database three ways:

1. Dashboard

The Schedules page in the dashboard (per-project) has a full CRUD UI: name, engine, kind (cron, interval, one_shot, solar), expression, task name, args, kwargs, queue, catch-up policy. An audit row is written on every change. The REST surface backing the UI is the schedules API.

2. Declarative (in your app’s startup hook)

Commit your schedules in source. Reconciler posts the dict to brain on app startup; same shape across django, flask, fastapi:

from z4j_scheduler.declarative import ScheduleSpec, reconcile

await reconcile(
    schedules=[
        ScheduleSpec(
            name="hourly-cleanup",
            engine="celery",
            kind="cron",
            expression="0 * * * *",
            task_name="myapp.tasks.cleanup",
        ),
    ],
    project="my-app",
    source="declarative",
    brain_url="http://brain:7700",
    api_token=settings.Z4J_BRAIN_API_TOKEN,
)

Re-running reconcile with the same dict is a no-op; only diffs land in the audit log.

3. One-shot importers (migration)

Import from any existing scheduler:

# Celery beat schedule -> z4j
z4j-scheduler import --from celery --celery-app myapp:app \
  --project myproject --brain-url https://brain.internal

# rq-scheduler -> z4j
z4j-scheduler import --from rq --redis-url redis://... \
  --project myproject --brain-url https://brain.internal

# APScheduler 3.x SQLAlchemy / Redis / Mongo jobstore -> z4j
z4j-scheduler import --from apscheduler --jobstore-url postgresql://... \
  --project myproject --brain-url https://brain.internal

# system crontab -> z4j
z4j-scheduler import --from cron --crontab /etc/crontab \
  --project myproject --brain-url https://brain.internal

Round-trip: every importer pairs with an exporter. z4j-scheduler export --to celery --celery-app myapp:app renders a Python beat_schedule file you can drop back into Celery if you ever decide to leave z4j.

Catch-up policy

Per-schedule field that decides what happens when the scheduler was down longer than the schedule’s interval:

skip - do nothing on recovery; the next regular tick fires once.
fire_one_missed - fire once on recovery, then resume normal cadence. Right for “nightly report” semantics.
fire_all_missed - fire once for every missed slot. Right for “every-5-minute metric backfill” semantics. Capped at 1000 fires per recovery to prevent runaways.

Default: skip (the safest choice for most schedules).

Misfire detection

A misfire is an enabled interval or cron schedule whose expected next fire has come and gone without the scheduler firing it. The usual cause is the scheduler process being down or partitioned - which is exactly why detection runs brain-side, not scheduler-side: a scheduler-side check cannot report its own death. The scheduler’s tick engine already covers the alive-but-behind case via each schedule’s catch-up policy; the misfire detector covers the case the tick engine structurally cannot.

A periodic brain worker computes each enabled interval/cron schedule’s expected next fire from its cadence (anchored on its last run, or its creation time when it has never fired) and flags any schedule whose expected fire is late past a grace window. Each detection, once per misfire episode:

writes a scheduler.misfire_detected row to the HMAC-chained audit log, naming the schedule, its expected fire, and how late it is;
fires any schedule.misfired automation rule on the project;
fans out to any schedule.misfired notification subscription (delivery channels + in-app bell);
bumps the z4j_scheduler_misfires_detected_total{project} Prometheus counter - a sustained non-zero rate means a scheduler is dead, partitioned, or badly behind.

When the scheduler recovers and the schedule fires, the episode ends; the next gap is a fresh episode and alerts again. Solar and one-shot (clocked) schedules are not misfire-detected: solar cadence depends on a location the brain does not hold, and a one-shot has no recurring slot to miss.

Recent misfires for one schedule are readable at GET /api/v1/projects/{slug}/schedules/{schedule_id}/misfires (viewer role); the project-wide history across every schedule is at GET /api/v1/projects/{slug}/schedules/misfires - see the schedules API. From a shell, the z4j misfires --project <slug> CLI command (with --json) prints the same project-wide history without the dashboard.

Knobs (detection only runs when Z4J_SCHEDULER_GRPC_ENABLED is on):

Variable	Default	Description
`Z4J_SCHEDULER_MISFIRE_GRACE_SECONDS`	`60`	Lateness past the expected fire before a schedule counts as misfired. The grace absorbs normal fire latency, gRPC jitter, and small clock skew. Range 5..3600.
`Z4J_SCHEDULER_MISFIRE_SWEEP_SECONDS`	`60`	Detector cadence. `0` disables misfire detection entirely.

Fire history

Every fire the scheduler dispatches is recorded per-schedule: status (delivered / buffered / acked / failed), scheduled-for vs fired-at, ack latency, and error detail, with manual “run now” fires attributed to the operator who triggered them. See Schedule fire history for the lifecycle, the API, and retention mechanics.

Replacing celery-beat - concrete steps

# 1. Install (no removal yet - both run side-by-side)
pip install z4j-scheduler

# 2. Import existing schedules into z4j
z4j-scheduler import --from celery --celery-app myapp:app \
  --project myproject --brain-url https://brain.internal

# 3. Verify in dashboard. Open the Schedules page; confirm everything
#    landed and is firing as expected.

# 4. Disable celery-beat (stop the daemon, remove from supervisor)
systemctl stop celery-beat
# or: docker compose stop celery-beat

# 5. Run z4j-scheduler instead (embedded or standalone - your choice)
z4j-scheduler serve --brain-url ...
# OR set Z4J_EMBEDDED_SCHEDULER=true on brain

# 6. (Optional) Uninstall celery-beat
pip uninstall celery-beat django-celery-beat

Coexistence with celery-beat - gradual migration

If you can’t stop celery-beat in one shot (e.g., shared Postgres schedule table with another team’s app), keep both running and use z4j-celerybeat as the coexistence adapter:

celery-beat continues firing its schedules.
z4j-celerybeat surfaces those celery-beat schedules in the z4j dashboard - read AND write (when django-celery-beat is installed).
z4j-scheduler can fire its OWN schedules alongside (different rows in z4j’s database).
Your dashboard shows both: celery-beat managed schedules with a source tag, z4j-managed schedules without.

When you’re ready to fully migrate, run the importer and disable celery-beat.

When NOT to use it

You only have one engine and one schedule. A single crontab line is simpler.
You explicitly want celery-beat’s exact behavior (e.g., a custom celery-beat scheduler class your team wrote). z4j-celerybeat surfaces that scheduler’s existing schedules; z4j-scheduler replaces it.
You’ve committed to APScheduler 4’s persistence model. That model is in-process; z4j-scheduler is out-of-process. Different shape, different trade-offs.