This comprehensive guide provides detailed information on designing, implementing, and managing complex multi-job workflows using Joblet’s YAML-based workflow definition language. The guide covers dependency management, resource allocation, and monitoring strategies for enterprise workflow orchestration.
Joblet’s workflow orchestration system enables organizations to define sophisticated multi-job execution pipelines through declarative YAML configurations. The system provides deterministic dependency resolution, comprehensive resource management, and network isolation capabilities with enterprise-grade validation and monitoring.
requires
clauses with state validationjobs:
job-name: # Job name (used for dependencies and monitoring)
command: "python3"
args: ["script.py", "--option", "value"]
runtime: "python-3.11-ml"
network: "bridge"
uploads:
files: ["script.py", "config.json"]
volumes: ["data-volume"]
requires:
- previous-job: "COMPLETED"
resources:
max_cpu: 50
max_memory: 1024
max_io_bps: 10485760
cpu_cores: "0-3"
Job Names:
jobs
section (e.g., job-name
, previous-job
)rnx job status --workflow
and rnx job list --workflow
commandsField | Description | Required | Example |
---|---|---|---|
command |
Executable to run | Yes | "python3" , "java" , "node" |
args |
Command arguments | No | ["script.py", "--verbose"] |
runtime |
Runtime environment | No | "python-3.11-ml" , "openjdk:21" |
network |
Network configuration | No | "bridge" , "isolated" , "none" , "custom-net" |
uploads |
Files to upload | No | See File Uploads |
volumes |
Persistent volumes | No | ["data-volume", "logs"] |
requires |
Job dependencies | No | See Job Dependencies |
resources |
Resource limits | No | See Resource Management |
jobs:
extract-data:
command: "python3"
args: ["extract.py"]
runtime: "python-3.11-ml"
process-data:
command: "python3"
args: ["process.py"]
runtime: "python:3.11-ml"
requires:
- extract-data: "COMPLETED"
generate-report:
command: "python3"
args: ["report.py"]
runtime: "python:3.11-ml"
requires:
- process-data: "COMPLETED"
jobs:
job-a:
command: "echo"
args: ["Job A completed"]
job-b:
command: "echo"
args: ["Job B completed"]
job-c:
command: "echo"
args: ["Job C needs both A and B"]
requires:
- job-a: "COMPLETED"
- job-b: "COMPLETED"
"COMPLETED"
- Wait for successful completion (exit code 0)"FAILED"
- Wait for job failure (non-zero exit code)"FINISHED"
- Wait for any completion (success or failure)jobs:
no-network-job:
command: "echo"
args: ["No network access"]
network: "none"
isolated-job:
command: "curl"
args: ["https://api.example.com"]
network: "isolated"
bridge-job:
command: "python3"
args: ["api_server.py"]
network: "bridge"
First create a custom network:
rnx network create backend --cidr=10.1.0.0/24
Then use it in workflows:
jobs:
backend-service:
command: "python3"
args: ["backend.py"]
network: "backend"
frontend-service:
command: "node"
args: ["frontend.js"]
network: "backend" # Same network for communication
Jobs in different networks are completely isolated:
jobs:
service-a:
command: "python3"
args: ["service_a.py"]
network: "network-1"
service-b:
command: "python3"
args: ["service_b.py"]
network: "network-2" # Cannot communicate with service-a
jobs:
process-files:
command: "python3"
args: ["processor.py"]
uploads:
files: ["processor.py", "config.json", "data.csv"]
jobs:
extract:
command: "python3"
args: ["extract.py"]
uploads:
files: ["extract.py"]
transform:
command: "python3"
args: ["transform.py"]
uploads:
files: ["transform.py", "transformations.json"]
requires:
- extract: "COMPLETED"
jobs:
memory-intensive:
command: "python3"
args: ["ml_training.py"]
resources:
max_cpu: 80 # 80% CPU limit
max_memory: 4096 # 4GB memory limit
cpu_cores: "0-3" # Bind to specific cores
io-intensive:
command: "python3"
args: ["data_processing.py"]
resources:
max_io_bps: 52428800 # 50MB/s I/O limit
Field | Description | Example |
---|---|---|
max_cpu |
CPU percentage limit (0-100) | 50 |
max_memory |
Memory limit in MB | 2048 |
max_io_bps |
I/O bandwidth limit in bytes/sec | 10485760 |
cpu_cores |
CPU core binding | "0-3" or "0,2,4" |
Joblet performs comprehensive validation before executing workflows:
$ rnx job run --workflow=my-workflow.yaml
🔍 Validating workflow prerequisites...
✅ No circular dependencies found
✅ All required volumes exist
✅ All required networks exist
✅ All required runtimes exist
✅ All job dependencies are valid
🎉 Workflow validation completed successfully!
$ rnx job run --workflow=broken-workflow.yaml
Error: workflow validation failed: network validation failed: missing networks: [non-existent-network]. Available networks: [bridge isolated none custom-net]
# Execute workflow
rnx job run --workflow=data-pipeline.yaml
# Execute with file uploads
rnx job run --workflow=ml-workflow.yaml # Automatically uploads files specified in YAML
# List all workflows
rnx job list --workflow
# Check specific workflow status (enhanced with job names and dependencies)
rnx job status --workflow <workflow-uuid>
# View workflow status with original YAML content
rnx job status --workflow --detail <workflow-uuid>
# Get workflow status with YAML content in JSON format (for scripting)
rnx job status --workflow --json --detail <workflow-uuid>
# Monitor job logs
rnx job log <job-uuid>
List View:
ID NAME STATUS PROGRESS
---- -------------------- ----------- ---------
20 client-workflow-1... COMPLETED 6/6
21 client-workflow-1... RUNNING 3/5
22 client-workflow-1... PENDING 0/4
Detailed Workflow Status:
# rnx job status --workflow a1b2c3d4-e5f6-7890-1234-567890abcdef
Workflow UUID: a1b2c3d4-e5f6-7890-1234-567890abcdef
Workflow: data-pipeline.yaml
Status: RUNNING
Progress: 2/4 jobs completed
Jobs in Workflow:
-----------------------------------------------------------------------------------------
JOB UUID JOB NAME STATUS EXIT CODE DEPENDENCIES
-----------------------------------------------------------------------------------------
f47ac10b-58cc-4372-a567-0e02b2c3d479 setup-data COMPLETED 0 -
a1b2c3d4-e5f6-7890-abcd-ef1234567890 process-data RUNNING - setup-data
00000000-0000-0000-0000-000000000000 validate-results PENDING - process-data
00000000-0000-0000-0000-000000000000 generate-report PENDING - validate-results
Features:
Use the --detail
flag with workflow status to view the original YAML content:
# Display workflow status with original YAML content
rnx job status --workflow --detail a1b2c3d4-e5f6-7890-1234-567890abcdef
Key Benefits:
Example Output:
Workflow UUID: a1b2c3d4-e5f6-7890-1234-567890abcdef
Workflow: data-pipeline.yaml
Status: RUNNING
Progress: 2/4 jobs completed
YAML Content:
=============
jobs:
setup-data:
command: "python3"
args: ["extract.py"]
runtime: "python:3.11-ml"
uploads:
files: ["extract.py"]
process-data:
command: "python3"
args: ["transform.py"]
runtime: "python:3.11-ml"
requires:
- setup-data: "COMPLETED"
uploads:
files: ["transform.py"]
=============
Jobs in Workflow:
...
# data-pipeline.yaml
jobs:
extract-data:
command: "python3"
args: ["extract.py"]
runtime: "python:3.11-ml"
uploads:
files: ["extract.py"]
volumes: ["data-pipeline"]
resources:
max_memory: 1024
validate-data:
command: "python3"
args: ["validate.py"]
runtime: "python:3.11-ml"
uploads:
files: ["validate.py"]
volumes: ["data-pipeline"]
requires:
- extract-data: "COMPLETED"
transform-data:
command: "python3"
args: ["transform.py"]
runtime: "python:3.11-ml"
uploads:
files: ["transform.py"]
volumes: ["data-pipeline"]
requires:
- validate-data: "COMPLETED"
resources:
max_cpu: 50
max_memory: 2048
load-to-warehouse:
command: "python3"
args: ["load.py"]
runtime: "python:3.11-ml"
uploads:
files: ["load.py"]
volumes: ["data-pipeline"]
requires:
- transform-data: "COMPLETED"
generate-report:
command: "python3"
args: ["report.py"]
runtime: "python:3.11-ml"
uploads:
files: ["report.py"]
volumes: ["data-pipeline"]
requires:
- load-to-warehouse: "COMPLETED"
cleanup:
command: "rm"
args: ["-rf", "data/", "*.pyc"]
volumes: ["data-pipeline"]
requires:
- generate-report: "COMPLETED"
# microservices.yaml
jobs:
database:
command: "postgres"
args: ["--config=/config/postgresql.conf"]
network: "backend"
volumes: ["db-data"]
api-service:
command: "python3"
args: ["api.py"]
runtime: "python:3.11-ml"
network: "backend"
uploads:
files: ["api.py", "requirements.txt"]
requires:
- database: "COMPLETED"
web-service:
command: "java"
args: ["-jar", "web-service.jar"]
runtime: "openjdk:21"
network: "frontend"
uploads:
files: ["web-service.jar", "application.properties"]
requires:
- api-service: "COMPLETED"
# Missing network
Error: missing networks: [custom-network]
Solution: Create the network or use an existing one
# Circular dependencies
Error: circular dependency detected: job 'a' depends on itself
Solution: Review and fix dependency chain
# Missing volumes
Error: missing volumes: [data-volume]
Solution: Create the volume with: rnx volume create data-volume
# Job fails to start
Check: Runtime exists and is properly configured
Check: Command and arguments are correct
Check: Required files are uploaded
# Network connectivity issues
Check: Jobs are in the same network if communication is needed
Check: Network exists and is properly configured
Check: Firewall rules allow required traffic
# Slow job execution
Check: Resource limits are appropriate
Check: CPU binding configuration
Check: I/O bandwidth limits
# Jobs not starting
Check: Dependencies are satisfied
Check: Required resources are available
Check: Workflow validation passed
# Check workflow validation
rnx job run --workflow=my-workflow.yaml # Shows validation details
# Check available resources
rnx runtime list
rnx volume list
rnx network list
# Monitor system resources
rnx monitor status
rnx job log <job-uuid>
/examples/workflows/
directory