Service-user deployments
Production deployments often run the host process under a dedicated service user - www-data for nginx + gunicorn, app for systemd-managed uvicorn, or an ephemeral uid via DynamicUser=yes. This page covers what changes about the agent in those environments and how to debug when it doesn’t come online.
What we handle for you
Section titled “What we handle for you”Service users typically don’t have a writable $HOME. www-data resolves to /var/www, nobody to /nonexistent, DynamicUser to a transient mount. The agent’s on-disk SQLite buffer would normally land in ~/.z4j/, which the process can’t create.
The agent detects this at startup and relocates the buffer to a per-uid tmp directory:
$TMPDIR/z4j-{uid}/buffer-{pid}.sqlite # mode 0700A WARNING is logged once, naming the original path and the chosen fallback. The buffer is still crash-safe (SQLite WAL mode), still per-process (no counter drift), and still bounded - only the location moved.
When you want a persistent buffer location
Section titled “When you want a persistent buffer location”The tmp fallback works but is wiped on tmpfs clears. If your host has a tmpfs /tmp, set an explicit path so events buffered during a brain outage survive a host reboot:
# 1. Create a writable dir owned by the service usersudo mkdir -p /var/lib/picker/.z4jsudo chown www-data:www-data /var/lib/picker/.z4j
# 2. Point z4j at it via systemd unit overridesudo systemctl edit gunicornAdd to the override:
[Service]Environment=Z4J_BUFFER_PATH=/var/lib/picker/.z4j/buffer.sqlitesudo systemctl daemon-reloadsudo systemctl restart gunicornThe path must live under one of the two allowed roots (~/.z4j or the per-uid tmp fallback) - the security clamp rejects anything else, so a typo like Z4J_BUFFER_PATH=/etc/passwd is structurally impossible.
The doctor command
Section titled “The doctor command”When an agent doesn’t come online, run the framework-side doctor first. It probes the same things the agent runtime would but synchronously and without starting the persistent connection - so you get specific failure reasons instead of “the dashboard shows unknown”.
Django:
sudo -u www-data /srv/picker/venv/bin/python /srv/picker/picker/manage.py z4j_doctorFlask:
sudo -u www-data /srv/picker/venv/bin/python -m z4j_flask doctorFastAPI:
sudo -u www-data /srv/picker/venv/bin/python -m z4j_fastapi doctorStandalone Celery (no framework):
sudo -u www-data /srv/picker/venv/bin/python -m z4j_bare doctorAlways run the doctor as the same user the service runs under. Otherwise Path.home() resolves differently, the buffer-path probe reads a different writable status, and the result doesn’t reflect what the actual service sees.
Sample output
Section titled “Sample output”z4j-doctor (django)=================== brain_url: https://tasks.example.com/ project_id: picker agent_name: picker_django buffer_path: /tmp/z4j-33/buffer-7281.sqlite transport: auto
[OK] buffer_path OK: buffer dir /tmp/z4j-33 is writable [OK] dns OK: tasks.example.com -> 198.51.100.42 [OK] tcp OK: TCP connect to tasks.example.com:443 [OK] tls OK: TLS TLSv1.3 to tasks.example.com (cert CN='tasks.example.com') [OK] websocket OK: ws upgrade to https://tasks.example.com/ succeeded
engines auto-detected: celeryAdd --no-websocket to skip the WS round-trip when z4j is intentionally offline. Add --json for scripting.
What each probe catches
Section titled “What each probe catches”| Probe | Catches |
|---|---|
buffer_path | Service user can’t write to its $HOME; tmp fallback also unwritable; explicit Z4J_BUFFER_PATH typo |
dns | Bad brain hostname, split-DNS, stale /etc/hosts |
tcp | Egress firewall blocks z4j port, NAT timeout, no route |
tls | Cert hostname mismatch, expired cert, untrusted CA, intermediate cert missing |
websocket | Reverse proxy strips Upgrade: header, wrong token, wrong project_id, missing HMAC, brain refuses protocol version |
systemd unit checklist
Section titled “systemd unit checklist”Common problems that the doctor will surface:
Environment=vsEnvironmentFile=- env vars set in your shell or a project.envfile don’t propagate to the service. Either declare them in the unit override (sudo systemctl edit <service>) or setEnvironmentFile=/path/to/.envso systemd loads it.ProtectHome=true/PrivateTmp=true- these restrict where the process can write. The agent’s tmp fallback still works underPrivateTmp=true(it’s a per-service tmpfs mount), but a customZ4J_BUFFER_PATHoutside the allowed view will fail at boot. Either drop the protection or move the buffer back to~/.z4j(with a writable HOME) or$TMPDIR/z4j-{uid}.User=vs running as root - token files and buffer paths created while testing as root won’t be readable when the service drops to its real user. Always test the doctor as the service user.
Related
Section titled “Related”- Allowed hosts - adding the public DNS name to z4j’s host allow-list
- Backup and restore - preserving the buffer across host moves
- Troubleshooting - general agent symptom guide