Run Powerful AI Locally with Ollama (No Cloud, No Meter)
You don't need a subscription—or to ship your prompts to someone else's servers—to get great coding help or a chat assistant. Ollama runs modern open-weight models entirely on your own machine, exposes a local API, and keeps your data on disk you control.
Why Local Models?
- Privacy & sovereignty: prompts and files never leave your machine; you talk to a local HTTP endpoint at
127.0.0.1:11434
- No usage meter: once weights are downloaded, inference is "free" (you pay in GPU/CPU, not tokens)
- Choice: pick models tuned for coding, chat, multilingual, etc., from a community library (Qwen, Llama, Mistral, DeepSeek)
Install Ollama
Windows 10+, macOS, or Linux. Use the official downloads.
- Windows: download installer (or use
winget
) from the official page - macOS/Linux: follow the platform instructions; afterwards you can run the local API and pull models the same way
# Windows (winget) winget install --id Ollama.Ollama -e # macOS (Homebrew) brew install ollama # Linux (official script) curl -fsSL https://ollama.com/install.sh | sh
Pick a Model (Coding & Chat)
For everyday coding + conversation, start with Qwen2.5-Coder and choose a size that fits your hardware. You can always install multiple sizes—only the one you run will occupy VRAM.
- 7B (light): great for HTML/CSS/JS/Python help on almost any GPU/CPU
ollama pull qwen2.5-coder:7b-instruct
- 14B (balanced): better reasoning, still multitask-friendly on mainstream GPUs
ollama pull qwen2.5-coder:14b-instruct
- 32B (max quality): strong multi-file coding help; needs more VRAM
ollama pull qwen2.5-coder:32b-instruct
Run It (Two Ways)
- Chat in the terminal:
ollama run qwen2.5-coder:14b-instruct
You'll see a>>>
prompt—type requests and it answers locally. - Use the local API: start the server with
ollama serve
, then POST tohttp://127.0.0.1:11434/api/generate
from your tools or scripts.
# Example: one-off prompt ollama run qwen2.5-coder:14b-instruct "Write a Python function to validate an email address." # Example: bigger context window (nice for code) set OLLAMA_NUM_CTX=16384 # Windows (PowerShell: $env:OLLAMA_NUM_CTX=16384) ollama run qwen2.5-coder:14b-instruct
Add a Friendly GUI (Optional)
Prefer tabs/history and a browser UI? Point Open WebUI at your local Ollama—download/manage models and chat in a clean interface, still fully local.
Licenses & "Can I Recommend This?"
You can absolutely write tutorials and link to official model pages. Model licenses vary (e.g., Qwen typically uses Apache-2.0; Meta's Llama uses the Llama license; DeepSeek-Coder models permit commercial use under their model license). Link to the official repos and avoid redistributing weights yourself unless the license permits it.
Why This Matters for Privacy
Local inference keeps prompts and documents on devices you control, which aligns with modern privacy guidance and ongoing standardization work (e.g., NIST's push to make privacy claims—like "differential privacy"—verifiable). Even if you're not using DP-trained models yet, keeping your workflow local reduces data exposure versus cloud APIs.
Quick Reference
- Download Ollama: official Windows/macOS/Linux installers
- Ollama API docs: endpoints, streaming, examples
- Qwen2.5-Coder (sizes & tags): library page
- Llama 3.1 overview: Meta's announcement
- DeepSeek-Coder: repo & license notes
- Open WebUI + Ollama: quick start
- NIST privacy guidance: DP guidelines & new registry draft