GPU Acceleration (CUDA/DirectML)
Speed up local LLMs on Windows with NVIDIA CUDA or Microsoft DirectML. Quick steps — then run and verify.
NVIDIA (CUDA)
Best performance on Windows with recent NVIDIA drivers.
1) Update your NVIDIA GPU driver (Game Ready/Studio).
2) Verify the driver is active:
nvidia-smi
3) Pull and run a model; Ollama will use the GPU when available.
ollama pull llama3
ollama run llama3
Check Task Manager → Performance → GPU to see utilization.
DirectML (Windows)
Works across vendors (AMD/Intel/NVIDIA). Performance varies by model and driver.
1) Use Windows 11 for best DirectML support; update your GPU drivers.
2) Run a model and monitor the GPU engine in Task Manager.
ollama pull mistral
ollama run mistral
If performance is low, try smaller quantizations or switch to CUDA (NVIDIA).
Tips for speed
• 8 GB+ VRAM recommended for smooth GPU runs.
• Keep models on an SSD; close apps that heavily use the GPU.
• Try smaller quantizations for faster throughput.
• For issues, see Troubleshooting and Benchmarks.
Community‑driven guide. Not affiliated with the official Ollama project.