๐Ÿง 

LLM Models: 2026 Guide

๐Ÿง‘โ€๐ŸŽ“ Apprentice

Two types of models, two purposes

When developing with AI, you use models in two very different contexts:

ContextWhat it doesExample
Coding agentWrites YOUR codeClaude Code, Cursor, Copilot
Your applicationResponds to YOUR usersThe chatbot you're building

โš ๏ธ Common mistake: Using the same model for both. An expensive model that writes excellent code may be unnecessary (and costly) for answering simple user questions.


Models for coding agents (January 2026)

These models power the tools YOU use to program:

ModelAgent using itSWE-benchContextPrice Input/Output
Claude Opus 4.5Claude Code72.0%200K$5 / $25 per 1M tokens
GPT-5.2-CodexCodex CLI, Copilot69.5%128K$6 / $30 per 1M tokens
Claude Sonnet 4Cursor, Cody72.7%200K$3 / $15 per 1M tokens
Gemini 2.5 ProGoogle Antigravity63.8%1M$1.25 / $5 per 1M tokens

๐Ÿ’ก SWE-bench measures how well a model solves real GitHub bugs. Higher % = better for code.


Models for production (January 2026)

These models go inside your app to serve users:

ModelProviderStrengthContextPrice Input/Output
Gemini 2.0 FlashGoogleVery fast, free up to 1500/day1M$0.10 / $0.40 per 1M
GLM-4.7Zhipu AIOpen source, very capable200KFree (local) / ~$0.50 via API
DeepSeek-V3.2DeepSeekExcellent quality/price128K$0.14 / $0.28 per 1M
Claude 3.5 HaikuAnthropicFast, economical200K$0.80 / $4 per 1M
Llama 3.3 70BMetaOpen source, runs locally128KFree (local)

Why does context (context window) matter?

The context is how much information the model can "see" in a single conversation.

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Context = Prompt + History + Files + Output        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Model with 8K context:   [โ–ˆโ–ˆโ–ˆโ–ˆ____] Only 8,000 tokens
Model with 128K context: [โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ] 128,000 tokens
Model with 1M context:   [โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ...] 1,000,000 tokens
Use caseMinimum recommended context
Simple chat (FAQ)8K
Document analysis32K-128K
Code agent (reads your repo)128K-200K
Analyze complete codebase1M+

Open Source Models: The free alternative

You don't need to pay for APIs. You can run models locally or use services like OpenRouter.

Option 1: Run locally with Ollama

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Download and run GLM-4
ollama run glm4

# Or DeepSeek
ollama run deepseek-v3

Requirements: GPU with 8GB+ VRAM for small models, 24GB+ for large ones.

Option 2: OpenRouter (unified API)

OpenRouter gives you access to all models with a single API Key:

// Use any model with OpenRouter
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_OPENROUTER_KEY',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'deepseek/deepseek-chat-v3',  // Or any other
    messages: [{ role: 'user', content: 'Hello!' }],
  }),
});

Popular models on OpenRouter (January 2026):

ModelOpenRouter IDPrice/1M tokens
DeepSeek V3.2deepseek/deepseek-chat-v3$0.14 input / $0.28 output
GLM-4.7zhipu/glm-4$0.50 input / $0.50 output
Llama 3.3 70Bmeta-llama/llama-3.3-70b$0.40 input / $0.40 output
Mistral Large 2mistralai/mistral-large-2$2 input / $6 output

Comparison: Which model for what?

I need...RecommendationWhy
Fast code writingClaude Sonnet 4 (via Cursor)Balance quality/speed
Complex coding tasksClaude Opus 4.5 (via Claude Code)Best reasoning
Chatbot for my app (free)Gemini 2.0 Flash1500 req/day free
High-quality chatbotClaude 3.5 HaikuFast and capable
Maximum privacyLlama 3.3 localRuns on your machine
Very low budgetDeepSeek V3.2Excellent quality/price

Real workflow

1. DEVELOPING (your computer)
   โ””โ”€โ”€ You use Claude Code or Cursor
       โ””โ”€โ”€ Model: Claude Opus 4.5 / Sonnet 4
       โ””โ”€โ”€ Cost: ~$0.50-2 per work session

2. IN PRODUCTION (your app)
   โ””โ”€โ”€ Your chatbot responds to users
       โ””โ”€โ”€ Model: Gemini Flash or DeepSeek
       โ””โ”€โ”€ Cost: ~$0.01-0.10 per 1000 users/day

๐Ÿ’ก The key: Use premium models to CREATE code, economical models to SERVE users.


Common mistakes

MistakeConsequenceSolution
Using Opus 4.5 in productionVery high costsUse Haiku or Flash
Using small model for codingPoor quality codeInvest in good dev model
Ignoring context"Forgets" conversationChoose model with adequate context
Not using OpenRouterLocked to one providerCentralize with OpenRouter

Practice

โ†’ Chatbot with Gemini โ€” Use Gemini Flash for free โ†’ API with Node โ€” Backend for your model


Useful links