Llama 3 — Meta's open-source flagship model
Llama 3 is an open-source large language model developed by Meta. Released in 2024, it quickly became the most popular model in the Ollama ecosystem due to its strong general-purpose performance and reasonable hardware requirements. The 8B parameter variant delivers quality comparable to much larger models on conversational and reasoning tasks.
On Windows, Llama 3 runs via Ollama with NVIDIA CUDA or AMD DirectML GPU acceleration. The 8B model fits comfortably in 6–8 GB VRAM and runs at 40–80 tokens per second on a mid-range GPU.
Run Llama 3 on Windows
Llama 3 variants and hardware requirements
| Variant | Parameters | Size | Min VRAM | Recommended for |
|---|---|---|---|---|
llama3 | 8B | 4.7 GB | 6 GB | Most users — best starting point |
llama3:8b-instruct-q4_K_M | 8B Q4 | 4.4 GB | 5 GB | Lower VRAM, slightly faster |
llama3:8b-instruct-q8_0 | 8B Q8 | 8.5 GB | 10 GB | Higher quality, needs more VRAM |
llama3:70b | 70B | 40 GB | 48 GB | High-end workstations only |
ollama pull llama3 (the default 8B Q4 model) is the right choice. It runs on any GPU with 6+ GB VRAM and on CPU-only systems.What Llama 3 is good at
Llama 3 8B performs well across a broad range of tasks:
- Writing and editing — drafting emails, documents, summaries
- Coding — writing, explaining and debugging code in most languages
- Q&A and reasoning — answering questions based on provided context
- Summarisation — condensing long documents into key points
- General conversation — the model is instruction-tuned for interactive use
Llama 3 questions
What is the difference between Llama 3 and Llama 3.1?
ollama pull llama3.1. For most tasks the difference is small; Llama 3 8B is faster and uses less VRAM.