Architecture Workshop: Design a URL Shortener
Welcome to this hands-on workshop where you'll design the architecture of a real system.
We're not just reading theory. We're going to think, decide, and build.
The Problem
Your client wants a service like bit.ly:
- Users input a long URL
- They receive a short URL (e.g.,
luxia.us/abc123) - Visiting the short URL redirects to the original
Sounds simple, right? Let's see how deep the rabbit hole goes.
Step 1: Requirements
Before writing code, we need to understand what we're building.
Functional Requirements
Think first: What should the user be able to do?
<details> <summary>See answer</summary>- Create a short URL from a long one
- Automatically redirect when visiting the short URL
- (Optional) View click statistics
- (Optional) Custom URLs (e.g.,
luxia.us/my-link) - (Optional) URL expiration
Non-Functional Requirements
Think first: What performance characteristics do we need?
<details> <summary>See answer</summary>- High availability: The service must always be up
- Low latency: Redirects in < 100ms
- Scalable: Millions of URLs without degradation
- Durability: URLs must not be lost
Step 2: Capacity Estimation
This is the part that separates junior from senior engineers.
Assumptions:
- 100 million URLs created per month
- Read:write ratio of 100:1 (100 reads per URL created)
Calculations
URLs per second (writes):
100M / 30 days / 24 hours / 3600 seconds โ 40 URLs/second
Reads per second:
40 ร 100 = 4,000 reads/second
Storage (5 years)
Think first: How much space do we need?
<details> <summary>See calculation</summary>- URLs per month: 100M
- URLs in 5 years: 100M ร 12 ร 5 = 6 billion
- Average size per record: ~500 bytes (URL + metadata)
- Total: 6B ร 500B = 3TB
Not that much. A modern disk handles it.
</details>Step 3: API Design
Let's keep it simple with REST.
Endpoints
POST /api/shorten
Body: { "url": "https://example.com/very-long-page" }
Response: { "shortUrl": "https://luxia.us/abc123", "code": "abc123" }
GET /:code
Response: 301 Redirect to original URL
Why 301 and not 302?
Think first: What's the difference?
<details> <summary>See answer</summary>- 301 (Moved Permanently): Browser caches the redirect
- 302 (Found): Every visit goes through our server
For a URL shortener, 302 is better if we want to:
- Count every visit
- Be able to change the destination URL
- Add expiration
For maximum performance (less load): 301
</details>Step 4: Database Schema
Option 1: SQL (PostgreSQL)
CREATE TABLE urls (
id BIGSERIAL PRIMARY KEY,
code VARCHAR(10) UNIQUE NOT NULL,
original_url TEXT NOT NULL,
created_at TIMESTAMP DEFAULT NOW(),
expires_at TIMESTAMP,
click_count BIGINT DEFAULT 0
);
CREATE INDEX idx_urls_code ON urls(code);
Option 2: NoSQL (MongoDB)
{
_id: ObjectId,
code: "abc123",
originalUrl: "https://...",
createdAt: ISODate,
expiresAt: ISODate,
clicks: 0
}
Which would you choose?
Think first: What factors would you consider?
<details> <summary>See analysis</summary>For this case, SQL is probably better:
- Simple and fixed schema
- We need transactions to avoid collisions
- Queries are simple (only by code)
- PostgreSQL scales very well up to millions of records
NoSQL would be better if:
- Variable schema per URL
- You need horizontal sharding from the start
- You have experience with MongoDB
Step 5: Encoding Algorithm
The heart of the system: how do we generate the short code?
Option A: Base62 with ID
const CHARS = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
function toBase62(num) {
if (num === 0) return CHARS[0]
let result = ''
while (num > 0) {
result = CHARS[num % 62] + result
num = Math.floor(num / 62)
}
return result
}
// Example:
// toBase62(1) = "1"
// toBase62(62) = "10"
// toBase62(1000000) = "4c92"
With 6 characters: 62^6 = 56 billion possible URLs
Option B: Hash + Truncation
const crypto = require('crypto')
function generateCode(url) {
const hash = crypto.createHash('md5').update(url).digest('base64')
return hash.substring(0, 7).replace(/[+/=]/g, 'x')
}
Which has fewer collisions?
Think first: Which would you choose and why?
<details> <summary>See analysis</summary>Base62 with auto-incremental ID:
- โ Zero collisions (each ID is unique)
- โ Predictable and efficient
- โ Reveals how many URLs exist (ID 1000 = thousand URLs)
- โ Easy to enumerate URLs
Truncated hash:
- โ Doesn't reveal system information
- โ Same input = same output (useful for deduplication)
- โ Possible collisions (must verify)
- โ More processing
Hybrid solution: Base62 with ID + random offset
const OFFSET = 100000000 // Large random number
const code = toBase62(id + OFFSET)
</details>
Step 6: Handling Collisions
If we use hashing, we need to handle collisions.
async function createShortUrl(originalUrl) {
let code = generateCode(originalUrl)
let attempts = 0
while (attempts < 5) {
const existing = await db.findByCode(code)
if (!existing) {
// Code available
await db.insert({ code, originalUrl })
return code
}
if (existing.originalUrl === originalUrl) {
// Same URL, reuse code
return code
}
// Collision: add suffix and retry
code = generateCode(originalUrl + Date.now() + attempts)
attempts++
}
throw new Error('Could not generate unique code')
}
Step 7: Caching Strategy
With 4,000 reads/second, we need caching.
Redis to the Rescue
const Redis = require('ioredis')
const redis = new Redis()
async function getOriginalUrl(code) {
// First look in cache
const cached = await redis.get(`url:${code}`)
if (cached) {
return cached
}
// If not there, look in DB
const url = await db.findByCode(code)
if (url) {
// Cache for 1 hour
await redis.setex(`url:${code}`, 3600, url.originalUrl)
return url.originalUrl
}
return null
}
Which URLs to cache?
Think first: Do we cache everything?
<details> <summary>See strategy</summary>Not everything deserves caching:
- Popular URLs (> 10 clicks/hour): Long cache (1 hour)
- New URLs: Short cache (5 minutes)
- Expired URLs: Don't cache
Implementation with LRU:
// Redis with memory limit
// maxmemory 1gb
// maxmemory-policy allkeys-lru
This automatically keeps the most accessed URLs in cache.
</details>Step 8: Rate Limiting
Protect yourself from abuse.
const rateLimit = require('express-rate-limit')
// Limit URL creation
const createLimiter = rateLimit({
windowMs: 15 * 60 * 1000, // 15 minutes
max: 100, // 100 URLs per window
message: { error: 'Too many URLs created. Try again later.' }
})
// Limit redirects (anti-bot)
const redirectLimiter = rateLimit({
windowMs: 60 * 1000, // 1 minute
max: 1000, // 1000 redirects
message: 'Rate limit exceeded'
})
app.post('/api/shorten', createLimiter, shortenHandler)
app.get('/:code', redirectLimiter, redirectHandler)
Step 9: Scalability Discussion
Scenario: 1,000 requests/day
What do you need?
- A basic server (1 CPU, 1GB RAM)
- Local PostgreSQL
- No cache necessary
Scenario: 100,000 requests/day
What changes?
<details> <summary>See answer</summary>- Add Redis for caching
- Server with more resources (2 CPU, 4GB RAM)
- Consider read replica for DB
Scenario: 1,000,000 requests/day
What changes?
<details> <summary>See answer</summary>- Load Balancer (Nginx or AWS ALB)
- Multiple application servers
- Redis Cluster for distributed cache
- Read replicas for PostgreSQL
- Consider CDN for static redirects
Scenario: 100,000,000 requests/day
Enterprise architecture:
<details> <summary>See answer</summary>- Database sharding by code range
- Multiple geographic regions
- CDN edge workers for redirects
- Kafka/RabbitMQ for async analytics
- Kubernetes for orchestration
Architecture Diagram
โโโโโโโโโโโโโโโโโโโ
โ CDN โ
โ (CloudFlare) โ
โโโโโโโโโโฌโโโโโโโโโ
โ
โโโโโโโโโโผโโโโโโโโโ
โ Load Balancer โ
โ (Nginx) โ
โโโโโโโโโโฌโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ โ
โโโโโโโโโโผโโโโโโโโโ โโโโโโโโโโผโโโโโโโโโ โโโโโโโโโโผโโโโโโโโโ
โ App Server โ โ App Server โ โ App Server โ
โ (Node.js) โ โ (Node.js) โ โ (Node.js) โ
โโโโโโโโโโฌโโโโโโโโโ โโโโโโโโโโฌโโโโโโโโโ โโโโโโโโโโฌโโโโโโโโโ
โ โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ โ
โโโโโโโโโโผโโโโโโโโโ โโโโโโโโโโผโโโโโโโโโ โโโโโโโโโโผโโโโโโโโโ
โ Redis โ โ PostgreSQL โ โ PostgreSQL โ
โ (Cache) โ โ (Primary) โ โ (Replica) โ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
Implementation: Starter Code
Project Structure
url-shortener/
โโโ docker-compose.yml
โโโ package.json
โโโ src/
โ โโโ index.js
โ โโโ routes/
โ โ โโโ urls.js
โ โโโ services/
โ โ โโโ urlService.js
โ โ โโโ cacheService.js
โ โโโ db/
โ โโโ postgres.js
โโโ .env.example
docker-compose.yml
version: '3.8'
services:
app:
build: .
ports:
- "3000:3000"
environment:
- DATABASE_URL=postgres://user:pass@db:5432/urlshortener
- REDIS_URL=redis://cache:6379
depends_on:
- db
- cache
db:
image: postgres:15
environment:
POSTGRES_USER: user
POSTGRES_PASSWORD: pass
POSTGRES_DB: urlshortener
volumes:
- postgres_data:/var/lib/postgresql/data
cache:
image: redis:7-alpine
volumes:
- redis_data:/data
volumes:
postgres_data:
redis_data:
src/index.js
const express = require('express')
const { Pool } = require('pg')
const Redis = require('ioredis')
const app = express()
app.use(express.json())
// Connections
const db = new Pool({ connectionString: process.env.DATABASE_URL })
const redis = new Redis(process.env.REDIS_URL)
// Base62 encoding
const CHARS = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
const toBase62 = (num) => {
if (num === 0) return CHARS[0]
let result = ''
while (num > 0) {
result = CHARS[num % 62] + result
num = Math.floor(num / 62)
}
return result
}
// POST /api/shorten - Create short URL
app.post('/api/shorten', async (req, res) => {
try {
const { url } = req.body
if (!url || !url.startsWith('http')) {
return res.status(400).json({ error: 'Invalid URL' })
}
// Insert and get ID
const result = await db.query(
'INSERT INTO urls (original_url) VALUES ($1) RETURNING id',
[url]
)
const code = toBase62(result.rows[0].id + 10000000)
// Update with the code
await db.query('UPDATE urls SET code = $1 WHERE id = $2', [code, result.rows[0].id])
// Cache
await redis.setex(`url:${code}`, 3600, url)
res.json({
shortUrl: `${process.env.BASE_URL || 'http://localhost:3000'}/${code}`,
code
})
} catch (error) {
console.error(error)
res.status(500).json({ error: 'Internal error' })
}
})
// GET /:code - Redirect
app.get('/:code', async (req, res) => {
try {
const { code } = req.params
// Look in cache
let originalUrl = await redis.get(`url:${code}`)
if (!originalUrl) {
// Look in DB
const result = await db.query(
'SELECT original_url FROM urls WHERE code = $1',
[code]
)
if (result.rows.length === 0) {
return res.status(404).json({ error: 'URL not found' })
}
originalUrl = result.rows[0].original_url
// Cache for next visits
await redis.setex(`url:${code}`, 3600, originalUrl)
}
// Increment counter (async, non-blocking)
db.query('UPDATE urls SET click_count = click_count + 1 WHERE code = $1', [code])
res.redirect(302, originalUrl)
} catch (error) {
console.error(error)
res.status(500).json({ error: 'Internal error' })
}
})
// Start server
const PORT = process.env.PORT || 3000
app.listen(PORT, () => {
console.log(`Server running on port ${PORT}`)
})
Initialize Database
-- init.sql
CREATE TABLE IF NOT EXISTS urls (
id BIGSERIAL PRIMARY KEY,
code VARCHAR(10) UNIQUE,
original_url TEXT NOT NULL,
created_at TIMESTAMP DEFAULT NOW(),
click_count BIGINT DEFAULT 0
);
CREATE INDEX IF NOT EXISTS idx_urls_code ON urls(code);
Exercises for You
- Add expiration: Modify the schema and code so URLs expire
- Analytics: Store visitor information (IP, User-Agent, referrer)
- Custom URLs: Allow users to choose their own code
- API Key: Implement authentication for creating URLs
- Dashboard: Create an interface to view statistics
Final Reflection
Designing architecture is about trade-offs:
- Simplicity vs Scalability: Start simple, scale when you need to
- Consistency vs Availability: For URLs, availability matters more
- Cost vs Performance: Redis costs, but saves on servers
The best architecture is one that solves today's problem without blocking tomorrow's solutions.
Ready to build yours?
Clone this code, modify it, and deploy it. The best way to learn architecture is by doing it.