Mistral 7B — fast and efficient open-source LLM
Mistral 7B is an open-source model developed by Mistral AI (France). Despite having only 7 billion parameters, it outperforms Llama 2 13B on most benchmarks and runs faster on consumer hardware. It excels at coding, summarisation, instruction following and quick Q&A.
On Windows with Ollama, Mistral 7B downloads in about 10 minutes over a typical connection and runs at 40–70 tokens/s on a GPU with 6+ GB VRAM. It is one of the best models for users with mid-range or older GPUs.
Run Mistral on Windows
Mistral variants and requirements
| Variant | Size | Min VRAM | Pull command |
|---|---|---|---|
| Mistral 7B (default) | 4.1 GB | 5 GB | ollama pull mistral |
| Mistral Nemo 12B | 7.1 GB | 8 GB | ollama pull mistral-nemo |
| Mistral Small 22B | 13 GB | 16 GB | ollama pull mistral-small |
| Mixtral 8x7B (MoE) | 26 GB | 32 GB | ollama pull mixtral |
What Mistral is best at
Mistral 7B is particularly strong at:
- Code generation and debugging — very strong for a 7B model, often matching larger models
- Summarisation — condenses long texts efficiently with good factual retention
- Instruction following — reliably does what you ask without unnecessary verbosity
- Low-latency chat — fast token generation makes conversations feel responsive