Model guide

Run Phi-4 & Gemma 2 on Windows with Ollama — small model guide

Phi-4 (Microsoft) and Gemma 2 (Google) deliver quality above their parameter count. Both run on GPUs with 3–9 GB VRAM. Gemma 2 2B even runs on CPU-only machines.

Phi-4 and Gemma 2 are both excellent choices for Windows users with limited VRAM. Both deliver quality well above their parameter count.

Phi-4 — small model, big performance

Phi-4 is Microsoft's small language model, optimised specifically for instruction following, STEM reasoning and coding. At 14B parameters it delivers quality that rivals much larger models on many benchmarks, while fitting in 9 GB VRAM. The Mini variant (3.8B) runs on almost any GPU.

cmd.exe
# Pull Phi-4 14B:
C:\> ollama pull phi4
# Or the smaller Mini (3.8B):
C:\> ollama pull phi4-mini
C:\> ollama run phi4
VariantSizeMin VRAMPull command
Phi-4 14B8.2 GB9 GBollama pull phi4
Phi-4 Mini 3.8B2.5 GB4 GBollama pull phi4-mini

What Phi-4 is best at

  • STEM and mathematics — specifically trained on high-quality STEM data
  • Code generation — strong performance on coding benchmarks for its size
  • Instruction following — precise, concise responses with minimal hallucination
  • Low-VRAM machines — Phi-4 Mini at 2.5 GB is one of the best quality-to-size models available

Gemma 2 — compact and capable

Gemma 2 is Google's open-source model, available in 2B, 9B and 27B sizes. The 2B variant is remarkably capable for its size and runs on almost any hardware — including CPU-only systems with 8 GB RAM. It is a strong choice for embedded applications, edge devices or machines with minimal VRAM.

cmd.exe
# Pull Gemma 2 2B (runs on almost anything):
C:\> ollama pull gemma2:2b
# Or the 9B for better quality:
C:\> ollama pull gemma2
C:\> ollama run gemma2:2b
VariantSizeMin VRAMPull command
Gemma 2 2B1.6 GB3 GBollama pull gemma2:2b
Gemma 2 9B (default)5.4 GB7 GBollama pull gemma2
Gemma 2 27B16 GB20 GBollama pull gemma2:27b

What Gemma 2 is best at

  • Low-VRAM / CPU-only — 2B runs on 3 GB VRAM or 8 GB RAM
  • Fast responses — very high tokens/s for its quality level
  • General tasks — Q&A, summarisation, writing

Phi-4 vs Gemma 2 — which should you choose?

Phi-4 Mini (3.8B)Gemma 2 2BPhi-4 (14B)Gemma 2 9B
Best forSTEM, codingUltra low VRAMQuality coding/reasoningBalanced quality
VRAM4 GB3 GB9 GB7 GB
SpeedFastVery fastModerateFast

Phi-4 & Gemma 2 questions

Can Gemma 2 2B run on CPU only?
Yes. Gemma 2 2B requires only 1.6 GB disk space and ~4 GB RAM. On CPU-only systems it runs at 8–15 tokens/s on a modern processor, which is usable for short tasks. It is the best quality model available for very low-resource Windows machines.
Is Phi-4 better than GPT-4 for coding?
Phi-4 14B approaches GPT-4 quality on specific coding benchmarks while running locally. For complex multi-file projects it lags behind frontier models, but for single-function generation, debugging and explanation tasks it is highly competitive.
What are the licenses for Phi-4 and Gemma 2?
Phi-4 uses the MIT license — free for commercial and research use. Gemma 2 uses Google's Gemma Terms of Use which allows broad commercial use. Both are effectively free for personal and business use.

Need help choosing a model?

Browse all popular models with VRAM requirements and use cases.

Models Hub