Model guide

Run Mistral on Windows with Ollama — setup guide

Mistral 7B is one of the fastest and most efficient open-source models for Windows. Runs at 40–70 tokens/s on a GPU with 5+ GB VRAM. This guide covers pull command, variants and use cases.

Mistral 7B — fast and efficient open-source LLM

Mistral 7B is an open-source model developed by Mistral AI (France). Despite having only 7 billion parameters, it outperforms Llama 2 13B on most benchmarks and runs faster on consumer hardware. It excels at coding, summarisation, instruction following and quick Q&A.

On Windows with Ollama, Mistral 7B downloads in about 10 minutes over a typical connection and runs at 40–70 tokens/s on a GPU with 6+ GB VRAM. It is one of the best models for users with mid-range or older GPUs.

Run Mistral on Windows

cmd.exe
# Pull Mistral 7B (~4.1 GB):
C:\> ollama pull mistral
pulling manifest...
pulling 61e88e884507... 100% ████████ 4.1 GB
success
# Start interactive chat:
C:\> ollama run mistral
>>> Summarise this article for me: ...

Mistral variants and requirements

VariantSizeMin VRAMPull command
Mistral 7B (default)4.1 GB5 GBollama pull mistral
Mistral Nemo 12B7.1 GB8 GBollama pull mistral-nemo
Mistral Small 22B13 GB16 GBollama pull mistral-small
Mixtral 8x7B (MoE)26 GB32 GBollama pull mixtral
Mistral 7B is the best choice if you have a GPU with 5–8 GB VRAM or are running CPU-only. It is faster than Llama 3 8B while being comparable in quality for most tasks.

What Mistral is best at

Mistral 7B is particularly strong at:

  • Code generation and debugging — very strong for a 7B model, often matching larger models
  • Summarisation — condenses long texts efficiently with good factual retention
  • Instruction following — reliably does what you ask without unnecessary verbosity
  • Low-latency chat — fast token generation makes conversations feel responsive

Mistral questions

Mistral vs Llama 3 — which should I use?
For general tasks they are comparable. Mistral 7B is faster and needs less VRAM (5 GB vs 6 GB for Llama 3 8B), making it better for older or mid-range GPUs. Llama 3 8B has a larger context window (8K vs 32K tokens in some variants) and slightly better reasoning. Try both and pick whichever fits your workflow.
What is Mixtral 8x7B?
Mixtral is a Mixture-of-Experts (MoE) model that uses 7B-sized expert networks. It achieves quality close to a 70B dense model while only activating 13B parameters per token, making it more efficient. It requires ~32 GB VRAM or 64 GB RAM for CPU mode.
How is Mistral licensed?
Mistral 7B uses the Apache 2.0 license, which allows free commercial and research use without restrictions. Newer Mistral models (Mistral Small, Medium, Large) have different commercial terms. The Apache 2.0 7B model via Ollama is fully free.

Ready to try Mistral?

Install Ollama and run Mistral 7B in under 15 minutes.

Install guide