Phi‑4 / Gemma 2

Small, efficient models for edge and everyday tasks. Run locally with Ollama on Windows.

Phi‑4 — quick start

Pull and run the compact, instruction‑tuned Phi‑4:

ollama pull phi4

ollama run phi4

Great for quick replies, low‑power setups, and simple assistants.

Pull and run the efficient Gemma 2:

ollama pull gemma2

ollama run gemma2

Good balance of quality/speed; works well on smaller GPUs with the right quantization.

• Use lower‑bit quantizations (e.g., Q4) for speed and minimal VRAM/RAM use.

• 8–12 GB system RAM is typically enough for compact variants; GPU VRAM 4–8 GB recommended for smooth acceleration.

• Keep models on an SSD to reduce load times and stalls.

• On‑device assistants and fast chat

• Drafting, rewriting, and summarization

• Lightweight coding help and Q&A

• Edge devices and battery‑sensitive workflows

• Start with smaller quantizations; upscale only if you need more quality.

• Use clear system prompts to steer tone and format.

• Close GPU‑heavy apps and update drivers for stability.

• Benchmark to find the sweet spot for your hardware.

Community‑driven guide. Not affiliated with the official Ollama project.