Phi‑4 — quick start
Pull and run the compact, instruction‑tuned Phi‑4:
Great for quick replies, low‑power setups, and simple assistants.
Small, efficient models for edge and everyday tasks. Run locally with Ollama on Windows.
Pull and run the compact, instruction‑tuned Phi‑4:
Great for quick replies, low‑power setups, and simple assistants.
Pull and run the efficient Gemma 2:
Good balance of quality/speed; works well on smaller GPUs with the right quantization.
• Use lower‑bit quantizations (e.g., Q4) for speed and minimal VRAM/RAM use.
• 8–12 GB system RAM is typically enough for compact variants; GPU VRAM 4–8 GB recommended for smooth acceleration.
• Keep models on an SSD to reduce load times and stalls.
• On‑device assistants and fast chat
• Drafting, rewriting, and summarization
• Lightweight coding help and Q&A
• Edge devices and battery‑sensitive workflows
• Start with smaller quantizations; upscale only if you need more quality.
• Use clear system prompts to steer tone and format.
• Close GPU‑heavy apps and update drivers for stability.
• Benchmark to find the sweet spot for your hardware.
Community‑driven guide. Not affiliated with the official Ollama project.