Check if GPU acceleration is already active
Ollama detects your GPU automatically on install. Before changing anything, verify whether it is already working:
If gpu layers shows 0, GPU is not active. Follow the relevant section below.
NVIDIA GPU — CUDA setup
Ollama uses CUDA for NVIDIA GPUs. Requirements: GTX 1000 series or newer, driver version 527+.
- 1
Update NVIDIA drivers
Download and install the latest Game Ready or Studio driver from nvidia.com/drivers. Minimum version 527.
- 2
Verify CUDA is available
C:\> nvidia-smiNVIDIA-SMI 546.01 | Driver Version: 546.01 | CUDA Version: 12.3| GeForce RTX 3080 | - 3
Restart Ollama and verify
Right-click the Ollama tray icon → Quit, then restart. Run
ollama run llama3 --verboseand confirmgpu layers: 32.
AMD GPU — DirectML on Windows
Ollama uses Microsoft DirectML for AMD GPUs on Windows. DirectML is built into Windows 10 v1903+ and Windows 11 — no separate install needed. Just update your AMD Adrenalin driver from amd.com/support.
For a detailed AMD DirectML walkthrough see DirectML guide.
Get the most from your GPU
Keep models on an SSD
.ollama/models folder in your user profile by default.Use quantized models for lower VRAM usage
ollama pull llama3:8b-instruct-q4_K_M. Lower Q = less VRAM, slightly lower quality.