GPU Acceleration (CUDA/DirectML)

Speed up local LLMs on Windows with NVIDIA CUDA or Microsoft DirectML. Quick steps — then run and verify.

NVIDIA (CUDA)

Best performance on Windows with recent NVIDIA drivers.

1) Update your NVIDIA GPU driver (Game Ready/Studio).

2) Verify the driver is active:

nvidia-smi

3) Pull and run a model; Ollama will use the GPU when available.

ollama pull llama3 ollama run llama3

Check Task Manager → Performance → GPU to see utilization.

Works across vendors (AMD/Intel/NVIDIA). Performance varies by model and driver.

1) Use Windows 11 for best DirectML support; update your GPU drivers.

2) Run a model and monitor the GPU engine in Task Manager.

ollama pull mistral ollama run mistral

If performance is low, try smaller quantizations or switch to CUDA (NVIDIA).

• 8 GB+ VRAM recommended for smooth GPU runs.

• Keep models on an SSD; close apps that heavily use the GPU.

• Try smaller quantizations for faster throughput.

• For issues, see Troubleshooting and Benchmarks.

Community‑driven guide. Not affiliated with the official Ollama project.