Model guide

Run Llama 3 on Windows with Ollama — setup guide

Llama 3 is the most popular model in the Ollama ecosystem. The 8B variant runs on any GPU with 6+ GB VRAM at 40–80 tokens per second. This guide covers pull command, variants, hardware requirements and use cases.

Llama 3 — Meta's open-source flagship model

Llama 3 is an open-source large language model developed by Meta. Released in 2024, it quickly became the most popular model in the Ollama ecosystem due to its strong general-purpose performance and reasonable hardware requirements. The 8B parameter variant delivers quality comparable to much larger models on conversational and reasoning tasks.

On Windows, Llama 3 runs via Ollama with NVIDIA CUDA or AMD DirectML GPU acceleration. The 8B model fits comfortably in 6–8 GB VRAM and runs at 40–80 tokens per second on a mid-range GPU.

Run Llama 3 on Windows

cmd.exe
# Pull the default 8B model:
C:\> ollama pull llama3
pulling manifest...
pulling 8934d96d3f08... 100% ████████ 4.7 GB
success
# Start an interactive session:
C:\> ollama run llama3
>>> Hello! Can you help me write a Python script?
Of course! What would you like the script to do?

Llama 3 variants and hardware requirements

VariantParametersSizeMin VRAMRecommended for
llama38B4.7 GB6 GBMost users — best starting point
llama3:8b-instruct-q4_K_M8B Q44.4 GB5 GBLower VRAM, slightly faster
llama3:8b-instruct-q8_08B Q88.5 GB10 GBHigher quality, needs more VRAM
llama3:70b70B40 GB48 GBHigh-end workstations only
For most Windows users, ollama pull llama3 (the default 8B Q4 model) is the right choice. It runs on any GPU with 6+ GB VRAM and on CPU-only systems.

What Llama 3 is good at

Llama 3 8B performs well across a broad range of tasks:

  • Writing and editing — drafting emails, documents, summaries
  • Coding — writing, explaining and debugging code in most languages
  • Q&A and reasoning — answering questions based on provided context
  • Summarisation — condensing long documents into key points
  • General conversation — the model is instruction-tuned for interactive use

Llama 3 questions

What is the difference between Llama 3 and Llama 3.1?
Llama 3.1 (released July 2024) extends context length to 128K tokens and adds improved multilingual support. Pull it with ollama pull llama3.1. For most tasks the difference is small; Llama 3 8B is faster and uses less VRAM.
Is Llama 3 censored?
The default Llama 3 instruct model has safety guardrails trained by Meta. There are uncensored variants in the Ollama library if you need them for specific research use cases. Ollama itself imposes no additional restrictions.
Can I run Llama 3 on CPU only?
Yes. CPU-only speed is 5–12 tokens/s on a modern multi-core CPU, which is usable for short tasks. For longer conversations a GPU is strongly recommended.
How is Llama 3 licensed?
Llama 3 uses Meta's custom Llama 3 Community License. It is free for research and commercial use for products with fewer than 700 million monthly active users. Check the full license at llama.meta.com.

Ready to run Llama 3?

Install Ollama on Windows and pull the model in under 5 minutes.

Install guide