Joblet is a micro-container runtime for running Linux jobs with: Process and filesystem isolation (PID namespace, chroot) Fine-grained CPU, memory, and IO throttling (cgroups v2) Secure job execution with mTLS and RBAC Built-in scheduler, SSE log streaming, and multi-core pinning Ideal for: Agentic AI Workloads (Untrusted code)
This document provides comprehensive information about Joblet’s job scheduling system, including how to schedule jobs for future execution, architecture details, multi-node behavior, and persistence across restarts.
Joblet supports scheduling jobs for future execution using RFC3339 timestamps. Scheduled jobs are:
| Feature | Behavior |
|---|---|
| Time Format | RFC3339 (e.g., 2025-01-15T10:30:00Z) |
| Timezone | UTC recommended, local time supported |
| Persistence | DynamoDB (survives restarts) |
| Node Affinity | Jobs execute on creating node only |
| Overdue Jobs | Execute immediately on recovery |
Use the --schedule flag to schedule a job for future execution:
# Schedule for a specific time (UTC)
rnx job run --schedule="2025-01-15T10:30:00Z" python3 process_data.py
# Schedule for a specific time with timezone offset
rnx job run --schedule="2025-01-15T10:30:00-05:00" ./backup.sh
# Schedule with resource limits
rnx job run --schedule="2025-01-15T14:00:00Z" \
--max-cpu=50 \
--max-memory=512 \
python3 heavy_computation.py
The schedule parameter must be in RFC3339 format:
YYYY-MM-DDTHH:MM:SSZ # UTC time
YYYY-MM-DDTHH:MM:SS+HH:MM # With positive offset
YYYY-MM-DDTHH:MM:SS-HH:MM # With negative offset
Examples:
# UTC time
--schedule="2025-06-15T09:00:00Z"
# US Eastern (EST, -5 hours)
--schedule="2025-06-15T09:00:00-05:00"
# Central European (CET, +1 hour)
--schedule="2025-06-15T09:00:00+01:00"
Scheduled jobs support file uploads. Files are staged immediately and stored until execution:
# Schedule job with file upload
rnx job run --schedule="2025-01-15T10:00:00Z" \
--upload=./data.csv:/workspace/data.csv \
python3 /workspace/process.py
# Multiple files
rnx job run --schedule="2025-01-15T10:00:00Z" \
--upload=./config.yaml:/app/config.yaml \
--upload=./script.py:/app/script.py \
python3 /app/script.py
When a job is scheduled, you receive confirmation with the job ID:
{
"uuid": "job-abc123-def456",
"status": "SCHEDULED",
"scheduledTime": "2025-01-15T10:30:00Z",
"nodeId": "node-prod-1"
}
┌─────────────────────────────────────────────────────────────────┐
│ Joblet Node │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │
│ │ gRPC API │───▶│ Scheduler │───▶│ Execution Engine │ │
│ │ (StartJob) │ │ │ │ │ │
│ └──────────────┘ └──────┬───────┘ └──────────────────┘ │
│ │ │
│ ┌──────▼───────┐ │
│ │ Priority │ │
│ │ Queue │ │
│ │ (Min-Heap) │ │
│ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────┐
│ DynamoDB │
│ (Persistence) │
└──────────────────┘
The scheduler uses a sleep-until-next strategy for efficient CPU usage:
SCHEDULED ──▶ INITIALIZING ──▶ RUNNING ──▶ COMPLETED
│ │ │
│ ▼ ▼
▼ STOPPED FAILED
CANCELED
| Status | Description |
|---|---|
SCHEDULED |
Job queued for future execution |
INITIALIZING |
Job starting (setting up resources) |
RUNNING |
Job actively executing |
COMPLETED |
Job finished successfully |
FAILED |
Job exited with error |
STOPPED |
Job manually stopped |
CANCELED |
Scheduled job removed before execution |
In multi-node deployments sharing a DynamoDB table, scheduled jobs are node-specific:
┌─────────────────┐ ┌─────────────────┐
│ Node A │ │ Node B │
│ nodeId: "a" │ │ nodeId: "b" │
│ │ │ │
│ Scheduler: │ │ Scheduler: │
│ - job-1 (a) ✓ │ │ - job-3 (b) ✓ │
│ - job-2 (a) ✓ │ │ - job-4 (b) ✓ │
└────────┬────────┘ └────────┬────────┘
│ │
└───────────┬───────────────┘
▼
┌─────────────────┐
│ DynamoDB │
│ │
│ job-1 (node:a) │
│ job-2 (node:a) │
│ job-3 (node:b) │
│ job-4 (node:b) │
└─────────────────┘
Behavior:
NodeId = "node-a" → Only Node A executes itNodeId = "node-b" → Ignores Node A’s jobsEach node must have a unique nodeId in its configuration:
# /opt/joblet/config/joblet-config.yml
server:
address: "0.0.0.0"
port: 50051
nodeId: "node-prod-1" # Must be unique per node
If a node goes down permanently:
SCHEDULED statusThis is a deliberate design choice. Future orchestration layers may add:
status=SCHEDULEDRUNNINGCOMPLETED/FAILEDWhen a Joblet node restarts:
1. Start Joblet
2. SyncFromPersistentState() → Load ALL jobs from DynamoDB
3. RecoverScheduledJobs() → Filter for:
- status == SCHEDULED
- nodeId == current node's ID
4. Add matching jobs to scheduler queue
5. Execute overdue jobs immediately
Log output during recovery:
[INFO] recovering scheduled jobs from persistent storage totalJobs=15 nodeId=node-prod-1
[INFO] scheduled job is overdue, will execute immediately job_uuid=job-abc123 overdueBy=5m30s
[DEBUG] recovered scheduled job job_uuid=job-def456 scheduledTime=2025-01-15T10:30:00Z
[INFO] scheduled job recovery completed recovered=3 skipped=0 nodeId=node-prod-1
Jobs whose scheduled time passed during downtime execute immediately:
| Scheduled Time | Node Down | Node Up | Behavior |
|---|---|---|---|
| 10:00 | 09:55 | 10:15 | Executes at 10:15 (15 min overdue) |
| 10:00 | 09:55 | 09:58 | Executes at 10:00 (on time) |
# List all jobs (includes scheduled)
rnx job list
# Filter by status
rnx job list --status=SCHEDULED
# JSON output for scripting
rnx job list --status=SCHEDULED --json
# Get job details
rnx job status <job-id>
# Example output
Job ID: job-abc123-def456
Status: SCHEDULED
Command: python3 process_data.py
Scheduled: 2025-01-15T10:30:00Z (in 2h 15m)
Node: node-prod-1
// List scheduled jobs
rpc ListJobs(ListJobsRequest) returns (ListJobsResponse);
message ListJobsRequest {
string status_filter = 1; // "SCHEDULED"
}
# Cancel a scheduled job
rnx job stop <job-id>
# Force cancel (if stuck)
rnx job stop --force <job-id>
Result:
CANCELED# Delete job and its data
rnx job delete <job-id>
| Limitation | Description | Workaround |
|---|---|---|
| No recurring schedules | Single execution only | External cron + API calls |
| Node-specific | Jobs tied to creating node | Design for node affinity |
| No distributed locking | No cross-node coordination | Future orchestration layer |
| In-memory queue | Queue rebuilt on restart | Persistent storage ensures durability |
0 */2 * * *)Planned features for future releases:
# GOOD: Use UTC for consistency
rnx job run --schedule="2025-01-15T10:00:00Z" ./script.sh
# GOOD: Explicit timezone if needed
rnx job run --schedule="2025-01-15T10:00:00-05:00" ./script.sh
# AVOID: Ambiguous local time (depends on server timezone)
# Set appropriate resource limits for scheduled jobs
rnx job run --schedule="2025-01-15T02:00:00Z" \
--max-cpu=80 \
--max-memory=2048 \
--max-io=50 \
./nightly_backup.sh
SCHEDULED job count via metricsSCHEDULED statenodeId for each nodeSymptoms: Job remains in SCHEDULED status past its scheduled time.
Checks:
systemctl status jobletjournalctl -u joblet | grep schedulernodeId matches current nodeSymptoms: Scheduled jobs missing after node restart.
Checks:
SyncFromPersistentState logsnodeId hasn’t changed# Check recovery logs
journalctl -u joblet | grep -E "(recovering|recovered)"
Symptoms: Same job running on multiple nodes.
Cause: Likely nodeId configuration issue.
Fix:
nodeIdnodeId fieldBehavior: Jobs scheduled for past times execute immediately.
# This will execute immediately
rnx job run --schedule="2020-01-01T00:00:00Z" echo "runs now"
| Error | Cause | Solution |
|---|---|---|
invalid schedule format |
Non-RFC3339 timestamp | Use YYYY-MM-DDTHH:MM:SSZ format |
job not found |
Job deleted or wrong ID | Verify job ID with rnx job list |
state client not available |
DynamoDB connection issue | Check state service logs |