Configuration
gpu.jsonc configuration reference
Configuration
GPU CLI uses a gpu.jsonc file in your project root for configuration. This file is created when you run gpu init.
JSON Schema
For IDE autocomplete, add the schema URL at the top of your gpu.jsonc:
{
"$schema": "https://gpu-cli.sh/schema/v1/gpu.json"
}Core Settings
gpu_types
Specify GPU types with optional counts. The system tries GPUs in array order.
{
"gpu_types": [
{ "type": "RTX 4090" }
]
}Common GPU types:
- Consumer:
RTX 4090,RTX 4080,RTX 3090 - Professional:
RTX A6000,RTX A5000,A40 - Datacenter:
A100 PCIe 80GB,H100 PCIe,H100 SXM
If omitted, GPU CLI auto-selects the best available GPU.
For multi-GPU workloads, specify count per GPU type:
{
"gpu_types": [
{ "type": "A100", "count": 4 }
]
}Specify multiple fallback GPUs in priority order:
{
"gpu_types": [
{ "type": "H100", "count": 8 },
{ "type": "A100", "count": 8 }
]
}min_vram
Minimum VRAM in GB. Used for GPU fallback when your preferred GPU isn't available.
{
"gpu_types": [{ "type": "RTX 4090" }],
"min_vram": 24
}If RTX 4090 isn't available, falls back to any GPU with 24GB+ VRAM.
max_price
Maximum hourly price you're willing to pay.
{
"max_price": 1.50
}Output Sync
outputs
Patterns for files to sync back from the pod. Uses glob patterns.
{
"outputs": [
"outputs/",
"checkpoints/",
"*.pt",
"*.safetensors"
]
}exclude_outputs
Patterns to exclude from output sync.
{
"exclude_outputs": [
"*.tmp",
"*.log"
]
}outputs_enabled
Enable/disable output syncing.
{
"outputs_enabled": true
}Environment
environment
Declarative environment specification for pod setup.
{
"environment": {
"python": {
"requirements": "requirements.txt"
},
"system": {
"apt": [
{ "name": "ffmpeg" }
]
}
}
}Python packages
{
"environment": {
"python": {
"requirements": "requirements.txt",
"pip_global": [
{ "name": "torch", "version": "2.1.0" },
{ "name": "transformers" }
]
}
}
}System packages
{
"environment": {
"system": {
"apt": [
{ "name": "ffmpeg" },
{ "name": "git-lfs" }
]
}
}
}Shell commands
{
"environment": {
"shell": {
"steps": [
{ "run": "pip install -e ." },
{ "run": "chmod +x scripts/setup.sh && ./scripts/setup.sh" }
]
}
}
}Downloads
Pre-download models and assets to the pod.
HuggingFace models
{
"download": [
{
"strategy": "hf",
"source": "black-forest-labs/FLUX.1-dev"
}
]
}HTTP downloads
{
"download": [
{
"strategy": "http",
"source": "https://example.com/model.bin",
"target": "models/model.bin"
}
]
}Git repositories
Clone tool repositories like ComfyUI with auto-update support:
{
"download": [
{
"strategy": "git",
"source": "https://github.com/comfyanonymous/ComfyUI",
"target": "ComfyUI"
}
]
}Pin to a specific version with branch, tag, or commit:
{
"download": [
{
"strategy": "git",
"source": "https://github.com/comfyanonymous/ComfyUI",
"target": "ComfyUI",
"tag": "v0.3.7"
}
]
}Options:
branch: Clone specific branchtag: Checkout specific tag (detached HEAD)commit: Checkout specific commit hash (detached HEAD)depth: Clone depth (default: 1 for shallow clone, 0 for full history)
The git strategy auto-pulls on subsequent runs if the working tree is clean. If you've made local modifications (like installing custom nodes), it will warn and preserve your changes.
Civitai models
Download models from Civitai using model IDs or AIR URNs:
{
"download": [
{
"strategy": "civitai",
"source": "4384"
}
]
}Supported source formats:
"4384"- Model ID (gets latest version)"4384:128713"- Model ID with specific version"urn:air:flux1:checkpoint:civitai:618692@691639"- Full AIR URN"air:sdxl:lora:civitai:328553@368189"- AIR without urn: prefix
AIR (AI Resource Names) provides a standardized way to reference AI models across platforms.
Pod Settings
keep_alive_minutes
Auto-stop timeout in minutes. Default is 5.
{
"keep_alive_minutes": 10
}docker_image
Override the base Docker image.
{
"docker_image": "runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04"
}dockerfile
Path to a Dockerfile for custom builds.
{
"dockerfile": "Dockerfile"
}workspace_size_gb
Workspace volume size in GB.
{
"workspace_size_gb": 50
}Network Volumes
Network volumes provide persistent storage that survives pod restarts. Essential for large models to avoid re-downloading.
network_volume_id
Attach a specific network volume by reference.
Accepts either:
- A volume ID (for example:
vol_abc123xyz) - A unique friendly volume name (for example:
shared-models)
If a friendly name matches multiple volumes, the run fails and asks you to use an ID.
{
"network_volume_id": "vol_abc123xyz"
}volume_mode
Strategy for network volume usage. Default is "global".
{
"volume_mode": "global"
}Options:
"global"- Use the shared global volume (set viagpu volume set-global)"dedicated"- Use a project-specific volume (resolve/create viadedicated_volume_idordedicated_volume_name)"none"- No network volume (ephemeral storage only)
dedicated_volume_id
When using volume_mode: "dedicated", optionally specify a dedicated volume reference.
Accepts either:
- A volume ID (recommended for deterministic behavior)
- A unique friendly name
If both dedicated_volume_id and dedicated_volume_name are set, dedicated_volume_id takes precedence.
{
"volume_mode": "dedicated",
"dedicated_volume_id": "vol_project_xyz"
}dedicated_volume_name
Optional friendly name for dedicated-mode resolution/creation.
{
"volume_mode": "dedicated",
"dedicated_volume_name": "my-project-models"
}If the name exists (unique match), that volume is used. If it doesn't exist, GPU CLI auto-creates it.
Volume Resolution Precedence
When provisioning a new pod, volume resolution follows this order:
network_volume_id(highest precedence)volume_mode = "none"→ no network volumevolume_mode = "global"→ use configured global volumevolume_mode = "dedicated":dedicated_volume_iddedicated_volume_name- auto-create dedicated volume
Dedicated auto-create requires a datacenter. It is chosen from:
- First entry in project
regions - Global network-volume datacenter
If neither is configured, auto-create fails with guidance.
Volume Management
Manage volumes via CLI:
# List all volumes
gpu volume list --detailed
# Create a volume and set as global
gpu volume create --name shared-models --size 500 --set-global
# Check volume usage
gpu volume statusSee Commands Reference for all volume commands.
Path Guidance for Model Downloads
For persistent model storage, prefer ${workspace_base} in download targets so paths stay aligned with provider/runtime mount settings.
{
"download": [
{
"strategy": "hf",
"source": "black-forest-labs/FLUX.1-schnell",
"target": "${workspace_base}/models/FLUX.1-schnell"
}
]
}Serverless
The serverless block configures serverless GPU endpoints. For the full guide, see Serverless Endpoints.
serverless.template
Template kind for the serverless worker. Default is "auto".
{
"serverless": {
"template": "comfyui"
}
}Options:
"auto"- Auto-detect from project files (default)"comfyui"- Official ComfyUI serverless worker"vllm"- Official vLLM serverless worker (OpenAI-compatible)"whisper"- Official Whisper serverless worker"custom-image"- Use a custom Docker image
serverless.gpu_type
Primary GPU type for serverless workers. Must match provider naming.
{
"serverless": {
"gpu_type": "NVIDIA GeForce RTX 4090"
}
}serverless.gpu_types
Fallback GPU types in priority order. Used when the primary gpu_type is unavailable.
{
"serverless": {
"gpu_type": "NVIDIA A100 80GB PCIe",
"gpu_types": ["NVIDIA L4", "NVIDIA GeForce RTX 4090"]
}
}serverless.scaling
Controls worker scaling behavior.
{
"serverless": {
"scaling": {
"min_workers": 0,
"max_workers": 3,
"idle_timeout": 5
}
}
}| Field | Default | Range | Description |
|---|---|---|---|
min_workers | 0 | 0-100 | Minimum active workers. Set to 0 for scale-to-zero. |
max_workers | 3 | 1-100 | Maximum concurrent workers. |
idle_timeout | 5 | 1-3600 | Seconds before idle worker shuts down. |
serverless.volume
Network volume for persistent storage across worker instances. Used for model caches, datasets, and checkpoints.
{
"serverless": {
"volume": {
"name": "my-project-vol",
"size_gb": 200,
"mount_path": "/runpod-volume"
}
}
}| Field | Default | Description |
|---|---|---|
name | — | Volume name (used for creation/lookup) |
size_gb | — | Volume size in GB (1-4000). Recommend 100-200 GB for most ML workloads. |
mount_path | "/runpod-volume" | Mount path inside workers |
serverless.prewarm
Pre-warm configuration for reducing cold starts. Downloads models/data before workers need them.
{
"serverless": {
"prewarm": {
"enabled": true,
"mode": "cpu",
"script": "bash /workspace/scripts/prewarm.sh",
"models": ["meta-llama/Llama-3.1-8B-Instruct"]
}
}
}| Field | Default | Description |
|---|---|---|
enabled | false | Enable pre-warming |
mode | "cpu" | "cpu" ("gpu" ( |
script | — | Shell command to run during CPU warmup |
models | [] | Model identifiers to pre-download (informational) |
serverless.runpod
RunPod-specific configuration. Only applies when deploying to RunPod Serverless (the default provider).
{
"serverless": {
"runpod": {
"flashboot": true,
"scaler_type": "queue_delay",
"scaler_value": 4,
"execution_timeout_ms": 600000,
"container_disk_gb": 50,
"data_center_ids": ["US-TX-3"],
"cached_model": "meta-llama/Llama-3.1-8B-Instruct",
"env": {
"MODEL_NAME": "meta-llama/Llama-3.1-8B-Instruct"
},
"ports": ["8188/http"],
"ids": {
"template_id": "abc123",
"endpoint_id": "ep_456",
"network_volume_id": "vol_789"
}
}
}
}| Field | Default | Description |
|---|---|---|
template | — | Explicit RunPod template ID or name (overrides auto-detection) |
flashboot | true | Enable FlashBoot for faster cold starts |
scaler_type | "queue_delay" | Scaling algorithm: "queue_delay" or "request_count" |
scaler_value | 4 | Scaler parameter (seconds for queue_delay, requests for request_count) |
execution_timeout_ms | 600000 | Max job execution time in milliseconds (10 min default) |
container_disk_gb | 50 | Container disk size in GB (1-500) |
data_center_ids | [] | Preferred data center IDs (e.g., ["US-TX-3", "CA-MTL-1"]) |
data_centers_required | false | If true, workers are restricted to listed DCs only |
cached_model | — | HuggingFace model ID for RunPod's cached model feature |
image_name | — | Docker image for custom-image template |
env | {} | Additional environment variables |
ports | [] | Port exposures (e.g., ["8188/http", "22/tcp"]) |
cpu_flavor | "cpu5g" | CPU flavor for pre-warm pods |
allowed_cuda_versions | [] | Allowed CUDA versions (e.g., ["12.4", "12.5"]) |
ids | — | Cached resource IDs for reproducible deployments (committable) |
serverless.runpod.ids
Non-secret IDs that can be committed to version control for reproducible deployments and team handoff.
{
"serverless": {
"runpod": {
"ids": {
"template_id": "abc123",
"endpoint_id": "ep_456",
"network_volume_id": "vol_789"
}
}
}
}Use gpu serverless deploy --write-ids project to auto-populate these after deployment.
Example Configurations
ML Training
{
"$schema": "https://gpu-cli.sh/schema/v1/gpu.json",
"gpu_types": [{ "type": "RTX 4090" }],
"min_vram": 24,
"outputs": [
"checkpoints/",
"logs/",
"*.pt"
],
"environment": {
"python": {
"requirements": "requirements.txt"
}
}
}Inference Server
{
"$schema": "https://gpu-cli.sh/schema/v1/gpu.json",
"gpu_types": [{ "type": "A100 PCIe 80GB" }],
"keep_alive_minutes": 30,
"download": [
{
"strategy": "hf",
"source": "meta-llama/Llama-2-7b-chat-hf"
}
]
}ComfyUI
{
"$schema": "https://gpu-cli.sh/schema/v1/gpu.json",
"gpu_types": [{ "type": "RTX 4090" }],
"outputs": ["outputs/"],
"download": [
{
"strategy": "hf",
"source": "black-forest-labs/FLUX.1-dev"
}
]
}Full Schema
See the complete JSON Schema at gpu-cli.sh/schema/v1/gpu.json.