SKIP TO CONTENT
ON AIR — VIBE CODING ACADEMY · EN · LIVE
All articles
TUTORIALS·January 2, 2026·15 MIN READ

Running Local LLMs for Coding: A Complete Setup Guide

By Daniel Nakamura

Why Go Local?

  • Complete privacy—code never leaves your machine
  • No API costs or rate limits
  • Works offline (planes, remote locations)
  • Faster for small queries (no network latency)

Hardware Requirements

  • Minimum: 16GB RAM, M1/M2 Mac or RTX 3060
  • Recommended: 32GB RAM, M2 Pro or RTX 4070
  • Ideal: 64GB RAM, M3 Max or RTX 4090

Best Models for Coding (Jan 2026)

  1. DeepSeek Coder V2 33B: Best overall
  2. CodeLlama 34B: Great for Python/JS
  3. Qwen2.5-Coder 32B: Excellent context window
  4. Mistral Large: Good general purpose

Setup with Ollama

# Install
curl -fsSL https://ollama.com/install.sh | sh

# Pull a model
ollama pull deepseek-coder-v2:33b

# Run
ollama run deepseek-coder-v2:33b

VS Code Integration

Install Continue extension, point to localhost:11434. Done.

Performance Tips

  • Use quantized models (Q4_K_M) for speed
  • Keep context under 8K tokens
  • Use GPU offloading when available