GPU CLI

Configuration

gpu.jsonc configuration reference

Configuration

GPU CLI uses a gpu.jsonc file in your project root for configuration. This file is created when you run gpu init.

JSON Schema

For IDE autocomplete, add the schema URL at the top of your gpu.jsonc:

{
  "$schema": "https://gpu-cli.sh/schema/v1/gpu.json"
}

Core Settings

gpu_types

Specify GPU types with optional counts. The system tries GPUs in array order.

{
  "gpu_types": [
    { "type": "RTX 4090" }
  ]
}

Common GPU types:

  • Consumer: RTX 4090, RTX 4080, RTX 3090
  • Professional: RTX A6000, RTX A5000, A40
  • Datacenter: A100 PCIe 80GB, H100 PCIe, H100 SXM

If omitted, GPU CLI auto-selects the best available GPU.

For multi-GPU workloads, specify count per GPU type:

{
  "gpu_types": [
    { "type": "A100", "count": 4 }
  ]
}

Specify multiple fallback GPUs in priority order:

{
  "gpu_types": [
    { "type": "H100", "count": 8 },
    { "type": "A100", "count": 8 }
  ]
}

min_vram

Minimum VRAM in GB. Used for GPU fallback when your preferred GPU isn't available.

{
  "gpu_types": [{ "type": "RTX 4090" }],
  "min_vram": 24
}

If RTX 4090 isn't available, falls back to any GPU with 24GB+ VRAM.

max_price

Maximum hourly price you're willing to pay.

{
  "max_price": 1.50
}

Output Sync

outputs

Patterns for files to sync back from the pod. Uses glob patterns.

{
  "outputs": [
    "outputs/",
    "checkpoints/",
    "*.pt",
    "*.safetensors"
  ]
}

exclude_outputs

Patterns to exclude from output sync.

{
  "exclude_outputs": [
    "*.tmp",
    "*.log"
  ]
}

outputs_enabled

Enable/disable output syncing.

{
  "outputs_enabled": true
}

Environment

environment

Declarative environment specification for pod setup.

{
  "environment": {
    "python": {
      "requirements": "requirements.txt"
    },
    "system": {
      "apt": [
        { "name": "ffmpeg" }
      ]
    }
  }
}

Python packages

{
  "environment": {
    "python": {
      "requirements": "requirements.txt",
      "pip_global": [
        { "name": "torch", "version": "2.1.0" },
        { "name": "transformers" }
      ]
    }
  }
}

System packages

{
  "environment": {
    "system": {
      "apt": [
        { "name": "ffmpeg" },
        { "name": "git-lfs" }
      ]
    }
  }
}

Shell commands

{
  "environment": {
    "shell": {
      "steps": [
        { "run": "pip install -e ." },
        { "run": "chmod +x scripts/setup.sh && ./scripts/setup.sh" }
      ]
    }
  }
}

Downloads

Pre-download models and assets to the pod.

HuggingFace models

{
  "download": [
    {
      "strategy": "hf",
      "source": "black-forest-labs/FLUX.1-dev"
    }
  ]
}

HTTP downloads

{
  "download": [
    {
      "strategy": "http",
      "source": "https://example.com/model.bin",
      "target": "models/model.bin"
    }
  ]
}

Git repositories

Clone tool repositories like ComfyUI with auto-update support:

{
  "download": [
    {
      "strategy": "git",
      "source": "https://github.com/comfyanonymous/ComfyUI",
      "target": "ComfyUI"
    }
  ]
}

Pin to a specific version with branch, tag, or commit:

{
  "download": [
    {
      "strategy": "git",
      "source": "https://github.com/comfyanonymous/ComfyUI",
      "target": "ComfyUI",
      "tag": "v0.3.7"
    }
  ]
}

Options:

  • branch: Clone specific branch
  • tag: Checkout specific tag (detached HEAD)
  • commit: Checkout specific commit hash (detached HEAD)
  • depth: Clone depth (default: 1 for shallow clone, 0 for full history)

The git strategy auto-pulls on subsequent runs if the working tree is clean. If you've made local modifications (like installing custom nodes), it will warn and preserve your changes.

Civitai models

Download models from Civitai using model IDs or AIR URNs:

{
  "download": [
    {
      "strategy": "civitai",
      "source": "4384"
    }
  ]
}

Supported source formats:

  • "4384" - Model ID (gets latest version)
  • "4384:128713" - Model ID with specific version
  • "urn:air:flux1:checkpoint:civitai:618692@691639" - Full AIR URN
  • "air:sdxl:lora:civitai:328553@368189" - AIR without urn: prefix

AIR (AI Resource Names) provides a standardized way to reference AI models across platforms.

Pod Settings

keep_alive_minutes

Auto-stop timeout in minutes. Default is 5.

{
  "keep_alive_minutes": 10
}

docker_image

Override the base Docker image.

{
  "docker_image": "runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04"
}

dockerfile

Path to a Dockerfile for custom builds.

{
  "dockerfile": "Dockerfile"
}

workspace_size_gb

Workspace volume size in GB.

{
  "workspace_size_gb": 50
}

Network Volumes

Network volumes provide persistent storage that survives pod restarts. Essential for large models to avoid re-downloading.

network_volume_id

Attach a specific network volume by reference.

Accepts either:

  • A volume ID (for example: vol_abc123xyz)
  • A unique friendly volume name (for example: shared-models)

If a friendly name matches multiple volumes, the run fails and asks you to use an ID.

{
  "network_volume_id": "vol_abc123xyz"
}

volume_mode

Strategy for network volume usage. Default is "global".

{
  "volume_mode": "global"
}

Options:

  • "global" - Use the shared global volume (set via gpu volume set-global)
  • "dedicated" - Use a project-specific volume (resolve/create via dedicated_volume_id or dedicated_volume_name)
  • "none" - No network volume (ephemeral storage only)

dedicated_volume_id

When using volume_mode: "dedicated", optionally specify a dedicated volume reference.

Accepts either:

  • A volume ID (recommended for deterministic behavior)
  • A unique friendly name

If both dedicated_volume_id and dedicated_volume_name are set, dedicated_volume_id takes precedence.

{
  "volume_mode": "dedicated",
  "dedicated_volume_id": "vol_project_xyz"
}

dedicated_volume_name

Optional friendly name for dedicated-mode resolution/creation.

{
  "volume_mode": "dedicated",
  "dedicated_volume_name": "my-project-models"
}

If the name exists (unique match), that volume is used. If it doesn't exist, GPU CLI auto-creates it.

Volume Resolution Precedence

When provisioning a new pod, volume resolution follows this order:

  1. network_volume_id (highest precedence)
  2. volume_mode = "none" → no network volume
  3. volume_mode = "global" → use configured global volume
  4. volume_mode = "dedicated":
    • dedicated_volume_id
    • dedicated_volume_name
    • auto-create dedicated volume

Dedicated auto-create requires a datacenter. It is chosen from:

  1. First entry in project regions
  2. Global network-volume datacenter

If neither is configured, auto-create fails with guidance.

Volume Management

Manage volumes via CLI:

# List all volumes
gpu volume list --detailed

# Create a volume and set as global
gpu volume create --name shared-models --size 500 --set-global

# Check volume usage
gpu volume status

See Commands Reference for all volume commands.

Path Guidance for Model Downloads

For persistent model storage, prefer ${workspace_base} in download targets so paths stay aligned with provider/runtime mount settings.

{
  "download": [
    {
      "strategy": "hf",
      "source": "black-forest-labs/FLUX.1-schnell",
      "target": "${workspace_base}/models/FLUX.1-schnell"
    }
  ]
}

Serverless

The serverless block configures serverless GPU endpoints. For the full guide, see Serverless Endpoints.

serverless.template

Template kind for the serverless worker. Default is "auto".

{
  "serverless": {
    "template": "comfyui"
  }
}

Options:

  • "auto" - Auto-detect from project files (default)
  • "comfyui" - Official ComfyUI serverless worker
  • "vllm" - Official vLLM serverless worker (OpenAI-compatible)
  • "whisper" - Official Whisper serverless worker
  • "custom-image" - Use a custom Docker image

serverless.gpu_type

Primary GPU type for serverless workers. Must match provider naming.

{
  "serverless": {
    "gpu_type": "NVIDIA GeForce RTX 4090"
  }
}

serverless.gpu_types

Fallback GPU types in priority order. Used when the primary gpu_type is unavailable.

{
  "serverless": {
    "gpu_type": "NVIDIA A100 80GB PCIe",
    "gpu_types": ["NVIDIA L4", "NVIDIA GeForce RTX 4090"]
  }
}

serverless.scaling

Controls worker scaling behavior.

{
  "serverless": {
    "scaling": {
      "min_workers": 0,
      "max_workers": 3,
      "idle_timeout": 5
    }
  }
}
FieldDefaultRangeDescription
min_workers00-100Minimum active workers. Set to 0 for scale-to-zero.
max_workers31-100Maximum concurrent workers.
idle_timeout51-3600Seconds before idle worker shuts down.

serverless.volume

Network volume for persistent storage across worker instances. Used for model caches, datasets, and checkpoints.

{
  "serverless": {
    "volume": {
      "name": "my-project-vol",
      "size_gb": 200,
      "mount_path": "/runpod-volume"
    }
  }
}
FieldDefaultDescription
nameVolume name (used for creation/lookup)
size_gbVolume size in GB (1-4000). Recommend 100-200 GB for most ML workloads.
mount_path"/runpod-volume"Mount path inside workers

serverless.prewarm

Pre-warm configuration for reducing cold starts. Downloads models/data before workers need them.

{
  "serverless": {
    "prewarm": {
      "enabled": true,
      "mode": "cpu",
      "script": "bash /workspace/scripts/prewarm.sh",
      "models": ["meta-llama/Llama-3.1-8B-Instruct"]
    }
  }
}
FieldDefaultDescription
enabledfalseEnable pre-warming
mode"cpu""cpu" ($0.06/hr, for downloads) or "gpu" ($0.40/hr, for GPU-required warmup)
scriptShell command to run during CPU warmup
models[]Model identifiers to pre-download (informational)

serverless.runpod

RunPod-specific configuration. Only applies when deploying to RunPod Serverless (the default provider).

{
  "serverless": {
    "runpod": {
      "flashboot": true,
      "scaler_type": "queue_delay",
      "scaler_value": 4,
      "execution_timeout_ms": 600000,
      "container_disk_gb": 50,
      "data_center_ids": ["US-TX-3"],
      "cached_model": "meta-llama/Llama-3.1-8B-Instruct",
      "env": {
        "MODEL_NAME": "meta-llama/Llama-3.1-8B-Instruct"
      },
      "ports": ["8188/http"],
      "ids": {
        "template_id": "abc123",
        "endpoint_id": "ep_456",
        "network_volume_id": "vol_789"
      }
    }
  }
}
FieldDefaultDescription
templateExplicit RunPod template ID or name (overrides auto-detection)
flashboottrueEnable FlashBoot for faster cold starts
scaler_type"queue_delay"Scaling algorithm: "queue_delay" or "request_count"
scaler_value4Scaler parameter (seconds for queue_delay, requests for request_count)
execution_timeout_ms600000Max job execution time in milliseconds (10 min default)
container_disk_gb50Container disk size in GB (1-500)
data_center_ids[]Preferred data center IDs (e.g., ["US-TX-3", "CA-MTL-1"])
data_centers_requiredfalseIf true, workers are restricted to listed DCs only
cached_modelHuggingFace model ID for RunPod's cached model feature
image_nameDocker image for custom-image template
env{}Additional environment variables
ports[]Port exposures (e.g., ["8188/http", "22/tcp"])
cpu_flavor"cpu5g"CPU flavor for pre-warm pods
allowed_cuda_versions[]Allowed CUDA versions (e.g., ["12.4", "12.5"])
idsCached resource IDs for reproducible deployments (committable)

serverless.runpod.ids

Non-secret IDs that can be committed to version control for reproducible deployments and team handoff.

{
  "serverless": {
    "runpod": {
      "ids": {
        "template_id": "abc123",
        "endpoint_id": "ep_456",
        "network_volume_id": "vol_789"
      }
    }
  }
}

Use gpu serverless deploy --write-ids project to auto-populate these after deployment.

Example Configurations

ML Training

{
  "$schema": "https://gpu-cli.sh/schema/v1/gpu.json",
  "gpu_types": [{ "type": "RTX 4090" }],
  "min_vram": 24,
  "outputs": [
    "checkpoints/",
    "logs/",
    "*.pt"
  ],
  "environment": {
    "python": {
      "requirements": "requirements.txt"
    }
  }
}

Inference Server

{
  "$schema": "https://gpu-cli.sh/schema/v1/gpu.json",
  "gpu_types": [{ "type": "A100 PCIe 80GB" }],
  "keep_alive_minutes": 30,
  "download": [
    {
      "strategy": "hf",
      "source": "meta-llama/Llama-2-7b-chat-hf"
    }
  ]
}

ComfyUI

{
  "$schema": "https://gpu-cli.sh/schema/v1/gpu.json",
  "gpu_types": [{ "type": "RTX 4090" }],
  "outputs": ["outputs/"],
  "download": [
    {
      "strategy": "hf",
      "source": "black-forest-labs/FLUX.1-dev"
    }
  ]
}

Full Schema

See the complete JSON Schema at gpu-cli.sh/schema/v1/gpu.json.

On this page