Native Linux Microcontainers
Joblet is a micro-container runtime for running Linux jobs with: Process and filesystem isolation (PID namespace, chroot) Fine-grained CPU, memory, and IO throttling (cgroups v2) Secure job execution with mTLS and RBAC Built-in scheduler, SSE log streaming, and multi-core pinning Ideal for: Agentic AI Workloads (Untrusted code)
Project maintained by ehsaniara
Hosted on GitHub Pages — Theme by mattgraham
Joblet is a comprehensive Linux-native job execution platform designed for enterprise workloads. It leverages Linux
namespaces and cgroups v2 to provide robust process isolation, resource management, and secure multi-tenant execution
environments without the overhead of containerization.
📚 Documentation Index
Getting Started
Architecture & Design
Features
Operations
Executive Summary
Joblet delivers enterprise-grade job execution capabilities by combining native Linux kernel features with modern
orchestration patterns. The platform provides deterministic resource allocation, comprehensive security isolation, and
seamless integration with existing infrastructure through a unified gRPC API and intuitive command-line interface.
Core Capabilities
- Process Isolation: Complete namespace separation (PID, network, mount, IPC, UTS) ensures zero cross-contamination
between workloads
- Resource Management: Granular control over CPU, memory, I/O, and GPU resources through cgroups v2 integration
- GPU Acceleration: Native NVIDIA GPU support with automatic device allocation, CUDA environment provisioning, and
memory isolation
- Network Virtualization: Software-defined networking with customizable CIDR blocks, traffic shaping, and inter-job
communication policies
- Storage Abstraction: Flexible volume management supporting persistent and ephemeral storage with quota enforcement
- Node Identification: Unique node identification for distributed deployments with automatic UUID generation and job
tracking
- Observability: Real-time metrics collection, structured logging, and comprehensive audit trails for compliance
requirements
- Log/Metric Persistence: Dedicated persistence service (
persist) with multiple storage backends including local
filesystem and AWS CloudWatch Logs for cloud-native deployments, featuring multi-node support, high-performance log
and metric storage with gzip compression, Unix socket IPC, and historical query capabilities
- Job State Persistence: Separate state service with pluggable backends (Memory, DynamoDB) ensuring job
metadata survives restarts, supporting auto-scaling deployments with async fire-and-forget operations for maximum
performance
Security Architecture
- Mutual TLS (mTLS): Certificate-based authentication ensures end-to-end encryption and identity verification
- Role-Based Access Control (RBAC): Fine-grained permission model with administrative, operational, and read-only
access tiers
- Privilege Containment: Kernel-enforced process isolation eliminates privilege escalation vectors
- Network Segmentation: Default-deny networking with explicit policy-based connectivity between workloads
- Audit Compliance: Comprehensive activity logging with tamper-resistant audit trails for regulatory requirements
Management Interfaces
- RNX Command-Line Interface: Cross-platform client supporting Linux, macOS, and Windows environments with full
feature parity
- Joblet Admin UI: Standalone React-based management dashboard
providing real-time system monitoring and job
orchestration capabilities via direct gRPC connectivity
- Log Aggregation: Streaming log infrastructure with advanced filtering, pattern matching, and retention policies
- Runtime Catalog: Curated collection of production-ready runtime environments including Python, Python ML with CUDA
libraries, and JVM-based platforms
Enterprise Use Cases
Continuous Integration and Deployment
# Run jobs with pre-built runtime environments
rnx job run --runtime=python-3.11-ml pytest tests/
rnx job run --runtime=openjdk-21 --upload=pom.xml --upload=src/ mvn clean install
Data Engineering and Analytics Workloads
# Isolated data processing with resource limits
rnx job run --max-memory=8192 --max-cpu=400 \
--volume=data-lake \
--runtime=python-3.11-ml \
python process_big_data.py
# GPU-accelerated data processing
rnx job run --gpu=2 --gpu-memory=16GB \
--max-memory=16384 \
--runtime=python-3.11-ml \
python gpu_analysis.py
Microservices Testing and Validation
# Network-isolated service testing
rnx network create test-env --cidr=10.10.0.0/24
rnx job run --network=test-env --runtime=openjdk-21 ./service-a
rnx job run --network=test-env --runtime=python-3.11-ml ./service-b
Site Reliability Engineering Operations
# Resource-bounded health checks with timeout
rnx job run --max-cpu=10 --max-memory=64 \
--runtime=python-3.11 \
python health_check.py
# Isolated incident response tooling
rnx job run --network=isolated \
--volume=incident-logs \
./debug-analyzer.sh
Artificial Intelligence and Machine Learning Workloads
# Multi-agent system with isolation
rnx job run --max-memory=4096 --runtime=python-3.11-ml \
python agent_coordinator.py
# GPU-powered ML agents
rnx job run --gpu=1 --gpu-memory=8GB \
--max-memory=2048 --runtime=python-3.11-ml \
--network=agent-net \
python inference_agent.py
rnx job run --max-memory=1024 --runtime=python-3.11-ml \
--network=agent-net \
python monitoring_agent.py
Technical Architecture
Linux Kernel Integration
- Control Groups v2: Hierarchical resource management with unified accounting and deterministic allocation
- Namespace Isolation: Complete process separation across all kernel subsystems (PID, network, mount, IPC, UTS)
- Native Process Execution: Direct process spawning eliminates virtualization overhead while maintaining security
boundaries
- Kernel API Integration: Leverages standard Linux system calls and interfaces for maximum compatibility and
performance
Security Framework
- Transport Security: Mutual TLS encryption with certificate pinning for all inter-component communication
- Access Control Model: Hierarchical RBAC implementation with principle of least privilege enforcement
- Isolation Boundaries: Kernel-enforced namespace separation prevents lateral movement and data leakage
- Resource Quotas: Hard limits on compute, memory, and I/O resources prevent denial-of-service conditions
- Stateless Architecture: Horizontally scalable design supports elastic capacity expansion
- Event-Driven Processing: Asynchronous job state management with sub-second latency
- API-First Design: Comprehensive gRPC API enables seamless integration with existing toolchains
- Modern Management Console: React-based interface optimized for operational efficiency and real-time monitoring
Documentation Resources
Getting Started
User Guides
Advanced Topics
Reference
Quick Start Example
# Install Joblet Server on Linux (see Installation Guide for details)
# Download from GitHub releases and run installation script
# Run your first job
rnx job run echo "Hello, Joblet!"
# Run a job with a runtime
rnx job run --runtime=python-3.11-ml python3 script.py
# Run a job with resource limits
rnx job run --max-memory=2048 --max-cpu=200 python3 intensive_task.py
Command Reference
Job Execution
# Run basic commands
rnx job run echo "Hello World"
rnx job run --runtime=python-3.11-ml python script.py
rnx job run --runtime=openjdk-21 java MyApp
# Resource limits
rnx job run --max-memory=2048 --max-cpu=200 intensive-task
# GPU-accelerated jobs
rnx job run --gpu=1 --gpu-memory=4GB python ml_training.py
rnx job run --gpu=2 --runtime=python-3.11-ml python distributed_inference.py
# Multi-process jobs (see PROCESS_ISOLATION.md for details)
rnx job run --runtime=python-3.11-ml bash -c "sleep 30 & sleep 40 & ps aux"
rnx job run --runtime=python-3.11-ml bash -c "task1 & task2 & wait"
Node Identification
# View jobs with node identification for distributed tracking
rnx job list
# Example output showing node IDs:
# UUID NAME NODE ID STATUS
# ------------------------------------ ------------ ------------------------------------ ----------
# f47ac10b-58cc-4372-a567-0e02b2c3d479 setup-data 8f94c5b2-1234-5678-9abc-def012345678 COMPLETED
# a1b2c3d4-e5f6-7890-abcd-ef1234567890 process-data 8f94c5b2-1234-5678-9abc-def012345678 RUNNING
# View detailed job status including node information
rnx job status f47ac10b-58cc-4372-a567-0e02b2c3d479
# Node ID information helps identify which Joblet instance executed each job
# Useful for debugging and tracking in multi-node distributed deployments
Runtime Management
# List available runtimes (Python, Python ML, Java)
rnx runtime list
# Get runtime information
rnx runtime info python-3.11-ml
# Build runtimes from YAML specifications
rnx runtime build ./examples/python-3.11-ml/runtime.yaml
rnx runtime build ./examples/python/runtime.yaml
rnx runtime build ./examples/java-21/runtime.yaml
# Remove runtimes
rnx runtime remove python-3.11-ml
# Test runtime functionality
rnx runtime test openjdk-21
Network & Storage
# Create isolated networks
rnx network create my-network --cidr=10.0.0.0/24
# Create persistent volumes
rnx volume create data-vol --size=10GB
# Use in jobs
rnx job run --network=my-network --volume=data-vol app
Business Value Proposition
- Infrastructure Simplification: Eliminates container registry management and image versioning complexity
- Enhanced Security Posture: Kernel-level isolation without container runtime vulnerabilities
- Operational Cost Reduction: Minimal resource overhead compared to container orchestration platforms
- Seamless Integration: Native compatibility with existing Linux infrastructure and tooling
Development Teams
- Rapid Iteration: Immediate job execution without container build cycles
- Enhanced Debugging: Direct process telematics and filesystem access for troubleshooting
- Curated Runtimes: Production-ready environments for Python, Java, and machine learning workloads
- Developer-Friendly Tooling: Intuitive CLI and web interfaces designed for productivity
Operations Teams
- Comprehensive Observability: Built-in metrics, monitoring, and alerting capabilities
- Enterprise Security: mTLS authentication with fine-grained RBAC policies
- Centralized Management: Web-based console for job orchestration and system administration
- Resource Governance: Enforced quotas and limits ensure fair resource allocation
Site Reliability Engineering
- Fault Isolation: Process boundaries prevent cascading failures across workloads
- Resource Predictability: Deterministic resource allocation ensures consistent performance
- Monitoring Integration: Native support for Prometheus, Grafana, and enterprise monitoring solutions
- Diagnostic Access: Direct process introspection capabilities for incident response
- GPU Acceleration: Native NVIDIA CUDA support with automatic driver management
- Multi-Agent Isolation: Secure execution environments for distributed AI systems
- Resource Optimization: Fine-grained control over CPU, memory, and GPU allocation
- ML-Ready Environments: Pre-configured runtimes with TensorFlow, PyTorch, and CUDA libraries
Getting Started
For detailed installation instructions and initial configuration, please refer to
the Quick Start Guide. For production deployment considerations, consult
the Deployment Guide.