DirectML guide

Ollama AMD GPU — DirectML setup & performance guide

Ollama uses Microsoft DirectML to accelerate inference on AMD Radeon GPUs on Windows. No ROCm needed — DirectML is built into Windows. This guide covers driver setup, verification, performance expectations and optimisation tips.

DirectML — AMD GPU acceleration for Ollama on Windows

Microsoft DirectML is a hardware-accelerated machine learning API built into Windows 10 (version 1903+) and Windows 11. Ollama uses DirectML to run inference on AMD Radeon GPUs on Windows, providing GPU acceleration without needing the ROCm stack that Linux AMD users require.

DirectML works automatically when you install Ollama on a Windows system with a compatible AMD GPU — no separate install required. The key requirements are an up-to-date AMD Adrenalin driver and Windows 10 version 1903 or later.

Check DirectML is active

cmd.exe
C:\> ollama run mistral --verbose
# After typing a prompt, check stats:
gpu layers: 32
backend: directml
eval rate: 28.4 tokens/s

If gpu layers shows 0 and backend shows cpu, DirectML is not active. Follow the steps below.

Enable DirectML for AMD GPU

  • 1

    Update AMD Adrenalin drivers

    Download and install the latest AMD Adrenalin Edition driver from amd.com/support. Older drivers have worse DirectML performance. For RX 6000 series and newer, use the latest stable release.

  • 2

    Verify Windows version

    DirectML requires Windows 10 version 1903 or later. Press Win+R → type winver. On Windows 11 you are already fine.

  • 3

    Reinstall or restart Ollama

    If you just updated your AMD driver, quit Ollama from the system tray and restart it. New drivers sometimes require a fresh service start to be detected.

  • 4

    Run with --verbose to confirm

    cmd.exe
    C:\> ollama run llama3 --verbose
    # Type a prompt and check:
    gpu layers: 32 backend: directml

Expected DirectML performance on AMD GPUs

GPUVRAMModelApprox. tokens/s
RX 7900 XTX24 GBLlama 3 8B Q440–55 t/s
RX 7800 XT16 GBLlama 3 8B Q430–42 t/s
RX 6700 XT12 GBMistral 7B Q420–32 t/s
RX 66008 GBMistral 7B Q415–24 t/s
AMD DirectML performance is generally 30–50% lower than equivalent NVIDIA CUDA hardware. This is a limitation of the DirectML backend vs native CUDA optimisations, not a hardware quality issue.

Improve DirectML performance

Use quantized models to fit in VRAM
AMD GPUs have good VRAM amounts on the RX 6000/7000 series. Use Q4_K_M quantization to maximise how much of the model fits in VRAM: ollama pull llama3:8b-instruct-q4_K_M. If the model overflows VRAM, parts run on CPU which is much slower.
Keep AMD drivers up to date
AMD regularly improves DirectML performance in driver updates. Check for new drivers monthly at amd.com/support. Major performance gains are sometimes delivered via driver updates alone.
Close other GPU workloads
Gaming, video encoding and other GPU-accelerated tasks share VRAM with Ollama. Close them before running large models to give Ollama the full VRAM budget.
AMD GPU not detected at all
If Ollama shows gpu layers: 0 after updating drivers and restarting: (1) confirm your Windows version is 1903+, (2) try a clean Ollama reinstall, (3) check if your GPU appears in Device Manager under Display adapters with no yellow warning icon.

Want to measure your AMD GPU speed?

Run the benchmark to see tokens per second on your hardware.

Benchmarks guide