Two types of models, two purposes

When developing with AI, you use models in two very different contexts:

Context	What it does	Example
Coding agent	Writes YOUR code	Claude Code, Cursor, Copilot
Your application	Responds to YOUR users	The chatbot you're building

⚠️ Common mistake: Using the same model for both. An expensive model that writes excellent code may be unnecessary (and costly) for answering simple user questions.

Models for coding agents (January 2026)

These models power the tools YOU use to program:

Model	Agent using it	SWE-bench	Context	Price Input/Output
Claude Opus 4.5	Claude Code	72.0%	200K	$5 / $25 per 1M tokens
GPT-5.2-Codex	Codex CLI, Copilot	69.5%	128K	$6 / $30 per 1M tokens
Claude Sonnet 4	Cursor, Cody	72.7%	200K	$3 / $15 per 1M tokens
Gemini 2.5 Pro	Google Antigravity	63.8%	1M	$1.25 / $5 per 1M tokens

💡 SWE-bench measures how well a model solves real GitHub bugs. Higher % = better for code.

Models for production (January 2026)

These models go inside your app to serve users:

Model	Provider	Strength	Context	Price Input/Output
Gemini 2.0 Flash	Google	Very fast, free up to 1500/day	1M	$0.10 / $0.40 per 1M
GLM-4.7	Zhipu AI	Open source, very capable	200K	Free (local) / ~$0.50 via API
DeepSeek-V3.2	DeepSeek	Excellent quality/price	128K	$0.14 / $0.28 per 1M
Claude 3.5 Haiku	Anthropic	Fast, economical	200K	$0.80 / $4 per 1M
Llama 3.3 70B	Meta	Open source, runs locally	128K	Free (local)

Why does context (context window) matter?

The context is how much information the model can "see" in a single conversation.

┌─────────────────────────────────────────────────────┐
│  Context = Prompt + History + Files + Output        │
└─────────────────────────────────────────────────────┘

Model with 8K context:   [████____] Only 8,000 tokens
Model with 128K context: [████████████████████████████████] 128,000 tokens
Model with 1M context:   [███████████████████████████████████████...] 1,000,000 tokens

Use case	Minimum recommended context
Simple chat (FAQ)	8K
Document analysis	32K-128K
Code agent (reads your repo)	128K-200K
Analyze complete codebase	1M+

Open Source Models: The free alternative

You don't need to pay for APIs. You can run models locally or use services like OpenRouter.

Option 1: Run locally with Ollama

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Download and run GLM-4
ollama run glm4

# Or DeepSeek
ollama run deepseek-v3

Requirements: GPU with 8GB+ VRAM for small models, 24GB+ for large ones.

Option 2: OpenRouter (unified API)

OpenRouter gives you access to all models with a single API Key:

// Use any model with OpenRouter
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_OPENROUTER_KEY',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'deepseek/deepseek-chat-v3',  // Or any other
    messages: [{ role: 'user', content: 'Hello!' }],
  }),
});

Popular models on OpenRouter (January 2026):

Model	OpenRouter ID	Price/1M tokens
DeepSeek V3.2	`deepseek/deepseek-chat-v3`	$0.14 input / $0.28 output
GLM-4.7	`zhipu/glm-4`	$0.50 input / $0.50 output
Llama 3.3 70B	`meta-llama/llama-3.3-70b`	$0.40 input / $0.40 output
Mistral Large 2	`mistralai/mistral-large-2`	$2 input / $6 output

Comparison: Which model for what?

I need...	Recommendation	Why
Fast code writing	Claude Sonnet 4 (via Cursor)	Balance quality/speed
Complex coding tasks	Claude Opus 4.5 (via Claude Code)	Best reasoning
Chatbot for my app (free)	Gemini 2.0 Flash	1500 req/day free
High-quality chatbot	Claude 3.5 Haiku	Fast and capable
Maximum privacy	Llama 3.3 local	Runs on your machine
Very low budget	DeepSeek V3.2	Excellent quality/price

Real workflow

1. DEVELOPING (your computer)
   └── You use Claude Code or Cursor
       └── Model: Claude Opus 4.5 / Sonnet 4
       └── Cost: ~$0.50-2 per work session

2. IN PRODUCTION (your app)
   └── Your chatbot responds to users
       └── Model: Gemini Flash or DeepSeek
       └── Cost: ~$0.01-0.10 per 1000 users/day

💡 The key: Use premium models to CREATE code, economical models to SERVE users.

Common mistakes

Mistake	Consequence	Solution
Using Opus 4.5 in production	Very high costs	Use Haiku or Flash
Using small model for coding	Poor quality code	Invest in good dev model
Ignoring context	"Forgets" conversation	Choose model with adequate context
Not using OpenRouter	Locked to one provider	Centralize with OpenRouter

Practice

→ Chatbot with Gemini — Use Gemini Flash for free → API with Node — Backend for your model

LLM Models: 2026 Guide