Two types of models, two purposes
When developing with AI, you use models in two very different contexts:
| Context | What it does | Example |
|---|---|---|
| Coding agent | Writes YOUR code | Claude Code, Cursor, Copilot |
| Your application | Responds to YOUR users | The chatbot you're building |
โ ๏ธ Common mistake: Using the same model for both. An expensive model that writes excellent code may be unnecessary (and costly) for answering simple user questions.
Models for coding agents (January 2026)
These models power the tools YOU use to program:
| Model | Agent using it | SWE-bench | Context | Price Input/Output |
|---|---|---|---|---|
| Claude Opus 4.5 | Claude Code | 72.0% | 200K | $5 / $25 per 1M tokens |
| GPT-5.2-Codex | Codex CLI, Copilot | 69.5% | 128K | $6 / $30 per 1M tokens |
| Claude Sonnet 4 | Cursor, Cody | 72.7% | 200K | $3 / $15 per 1M tokens |
| Gemini 2.5 Pro | Google Antigravity | 63.8% | 1M | $1.25 / $5 per 1M tokens |
๐ก SWE-bench measures how well a model solves real GitHub bugs. Higher % = better for code.
Models for production (January 2026)
These models go inside your app to serve users:
| Model | Provider | Strength | Context | Price Input/Output |
|---|---|---|---|---|
| Gemini 2.0 Flash | Very fast, free up to 1500/day | 1M | $0.10 / $0.40 per 1M | |
| GLM-4.7 | Zhipu AI | Open source, very capable | 200K | Free (local) / ~$0.50 via API |
| DeepSeek-V3.2 | DeepSeek | Excellent quality/price | 128K | $0.14 / $0.28 per 1M |
| Claude 3.5 Haiku | Anthropic | Fast, economical | 200K | $0.80 / $4 per 1M |
| Llama 3.3 70B | Meta | Open source, runs locally | 128K | Free (local) |
Why does context (context window) matter?
The context is how much information the model can "see" in a single conversation.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Context = Prompt + History + Files + Output โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Model with 8K context: [โโโโ____] Only 8,000 tokens
Model with 128K context: [โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ] 128,000 tokens
Model with 1M context: [โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ...] 1,000,000 tokens
| Use case | Minimum recommended context |
|---|---|
| Simple chat (FAQ) | 8K |
| Document analysis | 32K-128K |
| Code agent (reads your repo) | 128K-200K |
| Analyze complete codebase | 1M+ |
Open Source Models: The free alternative
You don't need to pay for APIs. You can run models locally or use services like OpenRouter.
Option 1: Run locally with Ollama
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Download and run GLM-4
ollama run glm4
# Or DeepSeek
ollama run deepseek-v3
Requirements: GPU with 8GB+ VRAM for small models, 24GB+ for large ones.
Option 2: OpenRouter (unified API)
OpenRouter gives you access to all models with a single API Key:
// Use any model with OpenRouter
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_OPENROUTER_KEY',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'deepseek/deepseek-chat-v3', // Or any other
messages: [{ role: 'user', content: 'Hello!' }],
}),
});
Popular models on OpenRouter (January 2026):
| Model | OpenRouter ID | Price/1M tokens |
|---|---|---|
| DeepSeek V3.2 | deepseek/deepseek-chat-v3 | $0.14 input / $0.28 output |
| GLM-4.7 | zhipu/glm-4 | $0.50 input / $0.50 output |
| Llama 3.3 70B | meta-llama/llama-3.3-70b | $0.40 input / $0.40 output |
| Mistral Large 2 | mistralai/mistral-large-2 | $2 input / $6 output |
Comparison: Which model for what?
| I need... | Recommendation | Why |
|---|---|---|
| Fast code writing | Claude Sonnet 4 (via Cursor) | Balance quality/speed |
| Complex coding tasks | Claude Opus 4.5 (via Claude Code) | Best reasoning |
| Chatbot for my app (free) | Gemini 2.0 Flash | 1500 req/day free |
| High-quality chatbot | Claude 3.5 Haiku | Fast and capable |
| Maximum privacy | Llama 3.3 local | Runs on your machine |
| Very low budget | DeepSeek V3.2 | Excellent quality/price |
Real workflow
1. DEVELOPING (your computer)
โโโ You use Claude Code or Cursor
โโโ Model: Claude Opus 4.5 / Sonnet 4
โโโ Cost: ~$0.50-2 per work session
2. IN PRODUCTION (your app)
โโโ Your chatbot responds to users
โโโ Model: Gemini Flash or DeepSeek
โโโ Cost: ~$0.01-0.10 per 1000 users/day
๐ก The key: Use premium models to CREATE code, economical models to SERVE users.
Common mistakes
| Mistake | Consequence | Solution |
|---|---|---|
| Using Opus 4.5 in production | Very high costs | Use Haiku or Flash |
| Using small model for coding | Poor quality code | Invest in good dev model |
| Ignoring context | "Forgets" conversation | Choose model with adequate context |
| Not using OpenRouter | Locked to one provider | Centralize with OpenRouter |
Practice
โ Chatbot with Gemini โ Use Gemini Flash for free โ API with Node โ Backend for your model