All articles
TUTORIALS·January 2, 2026·15 MIN READ
Running Local LLMs for Coding: A Complete Setup Guide
By Daniel Nakamura
Why Go Local?
- Complete privacy—code never leaves your machine
- No API costs or rate limits
- Works offline (planes, remote locations)
- Faster for small queries (no network latency)
Hardware Requirements
- Minimum: 16GB RAM, M1/M2 Mac or RTX 3060
- Recommended: 32GB RAM, M2 Pro or RTX 4070
- Ideal: 64GB RAM, M3 Max or RTX 4090
Best Models for Coding (Jan 2026)
- DeepSeek Coder V2 33B: Best overall
- CodeLlama 34B: Great for Python/JS
- Qwen2.5-Coder 32B: Excellent context window
- Mistral Large: Good general purpose
Setup with Ollama
# Install
curl -fsSL https://ollama.com/install.sh | sh
# Pull a model
ollama pull deepseek-coder-v2:33b
# Run
ollama run deepseek-coder-v2:33b
VS Code Integration
Install Continue extension, point to localhost:11434. Done.
Performance Tips
- Use quantized models (Q4_K_M) for speed
- Keep context under 8K tokens
- Use GPU offloading when available