Phi-4 — small model, big performance
Phi-4 is Microsoft's small language model, optimised specifically for instruction following, STEM reasoning and coding. At 14B parameters it delivers quality that rivals much larger models on many benchmarks, while fitting in 9 GB VRAM. The Mini variant (3.8B) runs on almost any GPU.
| Variant | Size | Min VRAM | Pull command |
|---|---|---|---|
| Phi-4 14B | 8.2 GB | 9 GB | ollama pull phi4 |
| Phi-4 Mini 3.8B | 2.5 GB | 4 GB | ollama pull phi4-mini |
What Phi-4 is best at
- STEM and mathematics — specifically trained on high-quality STEM data
- Code generation — strong performance on coding benchmarks for its size
- Instruction following — precise, concise responses with minimal hallucination
- Low-VRAM machines — Phi-4 Mini at 2.5 GB is one of the best quality-to-size models available
Gemma 2 — compact and capable
Gemma 2 is Google's open-source model, available in 2B, 9B and 27B sizes. The 2B variant is remarkably capable for its size and runs on almost any hardware — including CPU-only systems with 8 GB RAM. It is a strong choice for embedded applications, edge devices or machines with minimal VRAM.
| Variant | Size | Min VRAM | Pull command |
|---|---|---|---|
| Gemma 2 2B | 1.6 GB | 3 GB | ollama pull gemma2:2b |
| Gemma 2 9B (default) | 5.4 GB | 7 GB | ollama pull gemma2 |
| Gemma 2 27B | 16 GB | 20 GB | ollama pull gemma2:27b |
What Gemma 2 is best at
- Low-VRAM / CPU-only — 2B runs on 3 GB VRAM or 8 GB RAM
- Fast responses — very high tokens/s for its quality level
- General tasks — Q&A, summarisation, writing
Phi-4 vs Gemma 2 — which should you choose?
| Phi-4 Mini (3.8B) | Gemma 2 2B | Phi-4 (14B) | Gemma 2 9B | |
|---|---|---|---|---|
| Best for | STEM, coding | Ultra low VRAM | Quality coding/reasoning | Balanced quality |
| VRAM | 4 GB | 3 GB | 9 GB | 7 GB |
| Speed | Fast | Very fast | Moderate | Fast |