What is DirectML?

DirectML — AMD GPU acceleration for Ollama on Windows

Microsoft DirectML is a hardware-accelerated machine learning API built into Windows 10 (version 1903+) and Windows 11. Ollama uses DirectML to run inference on AMD Radeon GPUs on Windows, providing GPU acceleration without needing the ROCm stack that Linux AMD users require.

DirectML works automatically when you install Ollama on a Windows system with a compatible AMD GPU — no separate install required. The key requirements are an up-to-date AMD Adrenalin driver and Windows 10 version 1903 or later.

Verify it is working

Check DirectML is active

cmd.exe

C:\> ollama run mistral --verbose

# After typing a prompt, check stats:

gpu layers: 32

backend: directml

eval rate: 28.4 tokens/s

If gpu layers shows 0 and backend shows cpu, DirectML is not active. Follow the steps below.

Setup

Enable DirectML for AMD GPU

1
Update AMD Adrenalin drivers
Download and install the latest AMD Adrenalin Edition driver from amd.com/support. Older drivers have worse DirectML performance. For RX 6000 series and newer, use the latest stable release.
2
Verify Windows version
DirectML requires Windows 10 version 1903 or later. Press Win+R → type winver. On Windows 11 you are already fine.
3
Reinstall or restart Ollama
If you just updated your AMD driver, quit Ollama from the system tray and restart it. New drivers sometimes require a fresh service start to be detected.
4
Run with --verbose to confirm

cmd.exe
C:\> ollama run llama3 --verbose
# Type a prompt and check:
gpu layers: 32 backend: directml

Performance

Expected DirectML performance on AMD GPUs

GPU	VRAM	Model	Approx. tokens/s
RX 7900 XTX	24 GB	Llama 3 8B Q4	40–55 t/s
RX 7800 XT	16 GB	Llama 3 8B Q4	30–42 t/s
RX 6700 XT	12 GB	Mistral 7B Q4	20–32 t/s
RX 6600	8 GB	Mistral 7B Q4	15–24 t/s

AMD DirectML performance is generally 30–50% lower than equivalent NVIDIA CUDA hardware. This is a limitation of the DirectML backend vs native CUDA optimisations, not a hardware quality issue.

Tips

Improve DirectML performance

Use quantized models to fit in VRAM

AMD GPUs have good VRAM amounts on the RX 6000/7000 series. Use Q4_K_M quantization to maximise how much of the model fits in VRAM: ollama pull llama3:8b-instruct-q4_K_M. If the model overflows VRAM, parts run on CPU which is much slower.

Keep AMD drivers up to date

AMD regularly improves DirectML performance in driver updates. Check for new drivers monthly at amd.com/support. Major performance gains are sometimes delivered via driver updates alone.

Close other GPU workloads

Gaming, video encoding and other GPU-accelerated tasks share VRAM with Ollama. Close them before running large models to give Ollama the full VRAM budget.

AMD GPU not detected at all

If Ollama shows gpu layers: 0 after updating drivers and restarting: (1) confirm your Windows version is 1903+, (2) try a clean Ollama reinstall, (3) check if your GPU appears in Device Manager under Display adapters with no yellow warning icon.

Related guides

Ollama AMD GPU — DirectML setup & performance guide

DirectML — AMD GPU acceleration for Ollama on Windows

Check DirectML is active

Enable DirectML for AMD GPU

Update AMD Adrenalin drivers

Verify Windows version

Reinstall or restart Ollama

Run with --verbose to confirm

Expected DirectML performance on AMD GPUs

Improve DirectML performance

Want to measure your AMD GPU speed?