Quick start
Pull and run the default Qwen 2.5:
For best latency, keep the model on an SSD and try smaller quantizations.
Strong reasoning and multilingual capabilities. Run locally with Ollama on Windows.
Pull and run the default Qwen 2.5:
For best latency, keep the model on an SSD and try smaller quantizations.
Smaller Qwen 2.5 variants run well on most PCs (8–16 GB RAM). Larger variants benefit from more RAM/VRAM and GPU acceleration.
See GPU Acceleration for CUDA/DirectML setup and Benchmarks for speed guidance.
• Multilingual chat and translation scenarios
• Reasoning and instruction following
• Summarization, Q&A, and drafting
• Private, on‑device assistants
• Set the system prompt to establish role and tone (“You are a concise assistant…”).
• Use few‑shot examples for structured tasks (e.g., format conversions).
• Prefer smaller quantizations on limited hardware; upscale if you need more quality.
• Benchmark to find the best trade‑off for your machine.
Community‑driven guide. Not affiliated with the official Ollama project.