FAQ (Extended)
Your most asked questions about Ollama on Windows — clear and concise answers.
How do I install Ollama on Windows?
Download the official installer, run it, then open a new terminal and verify the CLI:
See the Install Guide for the full step‑by‑step.
How do I download and run a model?
Use the CLI:
Browse more in the Models Hub.
Can I run Ollama completely offline?
Yes. After models are pulled, generation works offline. Only downloads require the internet. See Privacy & Offline.
How do I enable GPU acceleration (CUDA/DirectML)?
Install the latest GPU drivers. On NVIDIA, CUDA is used automatically. On Windows 11, DirectML is available across vendors. See GPU Acceleration.
The 'ollama' command is not recognized
Close and reopen your terminal to refresh PATH. If needed, sign out and back in or reinstall. Verify with:
http://localhost:11434 does not respond
Check port usage and firewall settings:
Allow local connections when prompted by Windows Firewall. See Troubleshooting.
Model downloads are slow or stuck
Check your connection and free disk space, pause antivirus scans, and retry later. To refresh a model, remove and re‑pull it:
Which model should I start with?
Try Llama 3 for a balanced start, Mistral for speed, Qwen 2.5 for multilingual, or Phi‑4 / Gemma 2 for lightweight use. See Models Hub.
How much RAM/VRAM do I need?
8 GB RAM runs smaller models; 16 GB+ recommended for 8B. For GPU speed, 8 GB+ VRAM helps. Use lower quantizations (e.g., Q4) on limited hardware.
How do I use Ollama from Python?
Install the client and call the API:
See the Python page for REST and best practices.
How do I update Ollama and models?
Re‑run the latest installer to update the app. Re‑pull models to get the latest defaults:
More on Update.
How do I uninstall and remove models?
Uninstall via Windows settings, stop the service if needed, then delete models from your user profile if you want to free disk. See Uninstall.
Is my data sent to the cloud?
No. Prompts and outputs are processed locally. Only model downloads require internet access. See Privacy & Offline.
How do I benchmark performance fairly?
Warm up once, use a fixed prompt, set temperature=0, and average several runs. Track tokens/s and latency. See Benchmarks.
How do I fix out‑of‑memory or crashes?
Use smaller models or lower quantizations, close GPU‑heavy apps, ensure drivers are current, and keep models on SSD. See Troubleshooting.
Community‑driven guide. Not affiliated with the official Ollama project.