AI Ollama Helper
Local LLM Hub

Phi‑4 / Gemma 2

Small, efficient models for edge and everyday tasks. Run locally with Ollama on Windows.

Phi‑4 — quick start

Pull and run the compact, instruction‑tuned Phi‑4:

ollama pull phi4
ollama run phi4

Great for quick replies, low‑power setups, and simple assistants.

Gemma 2 — quick start

Pull and run the efficient Gemma 2:

ollama pull gemma2
ollama run gemma2

Good balance of quality/speed; works well on smaller GPUs with the right quantization.

Variants & hardware

• Use lower‑bit quantizations (e.g., Q4) for speed and minimal VRAM/RAM use.

• 8–12 GB system RAM is typically enough for compact variants; GPU VRAM 4–8 GB recommended for smooth acceleration.

• Keep models on an SSD to reduce load times and stalls.

Great for

• On‑device assistants and fast chat

• Drafting, rewriting, and summarization

• Lightweight coding help and Q&A

• Edge devices and battery‑sensitive workflows

Tips

• Start with smaller quantizations; upscale only if you need more quality.

• Use clear system prompts to steer tone and format.

• Close GPU‑heavy apps and update drivers for stability.

• Benchmark to find the sweet spot for your hardware.

⬇️ Download Ollama for Windows

Community‑driven guide. Not affiliated with the official Ollama project.