This comprehensive guide details Joblet’s GPU acceleration capabilities for high-performance computing, machine learning, and data-intensive workloads. Joblet provides enterprise-grade GPU resource management with deterministic allocation, memory isolation, and CUDA environment integration.
Joblet’s GPU orchestration framework enables organizations to:
# Run a simple GPU-accelerated Python script
rnx job run --gpu=1 python gpu_hello.py
# Specify minimum GPU memory (8GB in this example)
rnx job run --gpu=1 --gpu-memory=8GB python training_script.py
# Use 2 GPUs for distributed training
rnx job run --gpu=2 --runtime=python-3.11-ml python distributed_training.py
# Combine with other resource limits
rnx job run --gpu=2 --gpu-memory=16GB --max-memory=32768 --max-cpu=800 \
python intensive_gpu_workload.py
--gpu=N
: Specifies the number of GPU devices to allocate for job execution--gpu-memory=SIZE
: Defines minimum GPU memory requirement per device
8GB
, 4096MB
, or numeric values in megabytes--gpu-memory=8GB
, --gpu-memory=4096
# Single GPU with 4GB minimum memory
rnx job run --gpu=1 --gpu-memory=4GB python inference.py
# Two GPUs with 8GB each
rnx job run --gpu=2 --gpu-memory=8GB python multi_gpu_training.py
# GPU job with CPU and memory limits
rnx job run --gpu=1 --max-cpu=400 --max-memory=16384 python hybrid_workload.py
GPU resource requirements can be declaratively specified in workflow definitions:
# ml-training-pipeline.yaml
jobs:
data-preprocessing:
command: "python3"
args: ["preprocess.py"]
runtime: "python-3.11-ml"
resources:
max_memory: 4096
max_cpu: 200
model-training:
command: "python3"
args: ["train.py", "--epochs", "100"]
runtime: "python-3.11-ml"
requires:
- data-preprocessing: "COMPLETED"
resources:
max_memory: 16384
max_cpu: 800
gpu_count: 2 # Request 2 GPUs
gpu_memory_mb: 8192 # 8GB minimum per GPU
model-evaluation:
command: "python3"
args: ["evaluate.py"]
runtime: "python-3.11-ml"
requires:
- model-training: "COMPLETED"
resources:
max_memory: 8192
gpu_count: 1
gpu_memory_mb: 4096
# Run the GPU-enabled workflow
rnx job run --workflow=ml-training-pipeline.yaml
Upon GPU allocation request, Joblet performs the following automated operations:
/proc/driver/nvidia/gpus/
and nvidia-smi
interfacesCUDA_VISIBLE_DEVICES
for framework compatibilityJoblet automatically discovers and mounts CUDA installations from standard locations:
/usr/local/cuda
/opt/cuda
/usr/lib/cuda
GPU-enabled jobs receive the following environment configuration:
CUDA_VISIBLE_DEVICES=0,1 # Specific GPU indices allocated to your job
NVIDIA_VISIBLE_DEVICES=0,1 # Alternative naming for some frameworks
# View job status including GPU allocation
rnx job status abc123de-f456-7890-1234-567890abcdef
# Sample output:
# UUID: abc123de-f456-7890-1234-567890abcdef
# Status: RUNNING
# GPUs: [0, 1] (2 GPUs allocated)
# GPU Memory: 8192 MB required per GPU
The Joblet management interface provides:
# PyTorch distributed training
rnx job run --gpu=4 --gpu-memory=16GB --max-memory=65536 \
--runtime=python-3.11-ml \
--upload=model.py --upload=dataset/ \
python -m torch.distributed.launch --nproc_per_node=4 model.py
# TensorFlow model training
rnx job run --gpu=2 --gpu-memory=8GB \
--runtime=python-3.11-ml \
python tensorflow_training.py --batch-size=64
# RAPIDS GPU-accelerated data processing
rnx job run --gpu=1 --gpu-memory=12GB \
--volume=data-lake \
--runtime=python-3.11-ml \
python rapids_etl.py
# GPU-accelerated analytics
rnx job run --gpu=1 --max-memory=16384 \
--runtime=python-3.11-ml \
python cupy_analytics.py
# Large language model inference
rnx job run --gpu=1 --gpu-memory=24GB \
--runtime=python-3.11-ml \
python llm_inference.py --model=llama2-70b
# Computer vision inference pipeline
rnx job run --gpu=1 --gpu-memory=8GB \
--volume=images \
--runtime=python-3.11-ml \
python vision_pipeline.py
# Good: Balanced resource allocation
rnx job run --gpu=2 --gpu-memory=8GB --max-memory=16384 --max-cpu=800 \
python training.py
# Avoid: Over-requesting resources
# rnx job run --gpu=8 --gpu-memory=32GB --max-memory=128000 \
# python simple_inference.py
# Separate preprocessing (CPU-only) from training (GPU)
rnx job run --max-cpu=400 --max-memory=8192 python preprocess.py
rnx job run --gpu=1 --gpu-memory=8GB python train.py
# Use workflows for complex pipelines
rnx job run --workflow=ml-pipeline.yaml
# Development: Single GPU for testing
rnx job run --gpu=1 --gpu-memory=4GB python test_model.py
# Production: Multiple GPUs for performance
rnx job run --gpu=4 --gpu-memory=16GB python production_training.py
Issue: “No GPUs available”
nvidia-smi
on the host systemIssue: “Insufficient GPU memory”
--gpu-memory
parameter valuenvidia-smi
Issue: “CUDA libraries not accessible”
# Check GPU availability on the system
nvidia-smi
# View job details including GPU allocation
rnx job status --detail <job-uuid>
# Check server logs for GPU-related errors
# (Server-side debugging - contact your administrator)
System administrators configure GPU support through the Joblet server configuration file:
# joblet-config.yml
gpu:
enabled: true
cuda_paths:
- "/usr/local/cuda"
- "/opt/cuda"
- "/usr/lib/cuda"
The python-3.11-ml
runtime environment provides:
For teams transitioning from Docker or Kubernetes GPU deployments:
# Old Docker approach
docker run --gpus=2 --shm-size=16g nvidia/cuda:11.8-devel python train.py
# New Joblet approach
rnx job run --gpu=2 --max-memory=16384 --runtime=python-3.11-ml python train.py