Architecture Workshop: Design a URL Shortener

Welcome to this hands-on workshop where you'll design the architecture of a real system.

We're not just reading theory. We're going to think, decide, and build.

The Problem

Your client wants a service like bit.ly:

Users input a long URL
They receive a short URL (e.g., luxia.us/abc123)
Visiting the short URL redirects to the original

Sounds simple, right? Let's see how deep the rabbit hole goes.

Step 1: Requirements

Before writing code, we need to understand what we're building.

Functional Requirements

Think first: What should the user be able to do?

<details> <summary>See answer</summary>

Create a short URL from a long one
Automatically redirect when visiting the short URL
(Optional) View click statistics
(Optional) Custom URLs (e.g., luxia.us/my-link)
(Optional) URL expiration

</details>

Non-Functional Requirements

Think first: What performance characteristics do we need?

<details> <summary>See answer</summary>

High availability: The service must always be up
Low latency: Redirects in < 100ms
Scalable: Millions of URLs without degradation
Durability: URLs must not be lost

</details>

Step 2: Capacity Estimation

This is the part that separates junior from senior engineers.

Assumptions:

100 million URLs created per month
Read:write ratio of 100:1 (100 reads per URL created)

Calculations

URLs per second (writes):

100M / 30 days / 24 hours / 3600 seconds ≈ 40 URLs/second

Reads per second:

40 × 100 = 4,000 reads/second

Storage (5 years)

Think first: How much space do we need?

<details> <summary>See calculation</summary>

URLs per month: 100M
URLs in 5 years: 100M × 12 × 5 = 6 billion
Average size per record: ~500 bytes (URL + metadata)
Total: 6B × 500B = 3TB

Not that much. A modern disk handles it.

</details>

Step 3: API Design

Let's keep it simple with REST.

Endpoints

POST /api/shorten
Body: { "url": "https://example.com/very-long-page" }
Response: { "shortUrl": "https://luxia.us/abc123", "code": "abc123" }

GET /:code
Response: 301 Redirect to original URL

Why 301 and not 302?

Think first: What's the difference?

<details> <summary>See answer</summary>

301 (Moved Permanently): Browser caches the redirect
302 (Found): Every visit goes through our server

For a URL shortener, 302 is better if we want to:

Count every visit
Be able to change the destination URL
Add expiration

For maximum performance (less load): 301

</details>

Step 4: Database Schema

Option 1: SQL (PostgreSQL)

CREATE TABLE urls (
    id BIGSERIAL PRIMARY KEY,
    code VARCHAR(10) UNIQUE NOT NULL,
    original_url TEXT NOT NULL,
    created_at TIMESTAMP DEFAULT NOW(),
    expires_at TIMESTAMP,
    click_count BIGINT DEFAULT 0
);

CREATE INDEX idx_urls_code ON urls(code);

Option 2: NoSQL (MongoDB)

{
    _id: ObjectId,
    code: "abc123",
    originalUrl: "https://...",
    createdAt: ISODate,
    expiresAt: ISODate,
    clicks: 0
}

Which would you choose?

Think first: What factors would you consider?

<details> <summary>See analysis</summary>

For this case, SQL is probably better:

Simple and fixed schema
We need transactions to avoid collisions
Queries are simple (only by code)
PostgreSQL scales very well up to millions of records

NoSQL would be better if:

Variable schema per URL
You need horizontal sharding from the start
You have experience with MongoDB

</details>

Step 5: Encoding Algorithm

The heart of the system: how do we generate the short code?

Option A: Base62 with ID

const CHARS = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'

function toBase62(num) {
    if (num === 0) return CHARS[0]
    let result = ''
    while (num > 0) {
        result = CHARS[num % 62] + result
        num = Math.floor(num / 62)
    }
    return result
}

// Example:
// toBase62(1) = "1"
// toBase62(62) = "10"
// toBase62(1000000) = "4c92"

With 6 characters: 62^6 = 56 billion possible URLs

Option B: Hash + Truncation

const crypto = require('crypto')

function generateCode(url) {
    const hash = crypto.createHash('md5').update(url).digest('base64')
    return hash.substring(0, 7).replace(/[+/=]/g, 'x')
}

Which has fewer collisions?

Think first: Which would you choose and why?

<details> <summary>See analysis</summary>

Base62 with auto-incremental ID:

✅ Zero collisions (each ID is unique)
✅ Predictable and efficient
❌ Reveals how many URLs exist (ID 1000 = thousand URLs)
❌ Easy to enumerate URLs

Truncated hash:

✅ Doesn't reveal system information
✅ Same input = same output (useful for deduplication)
❌ Possible collisions (must verify)
❌ More processing

Hybrid solution: Base62 with ID + random offset

const OFFSET = 100000000 // Large random number
const code = toBase62(id + OFFSET)

</details>

Step 6: Handling Collisions

If we use hashing, we need to handle collisions.

async function createShortUrl(originalUrl) {
    let code = generateCode(originalUrl)
    let attempts = 0

    while (attempts < 5) {
        const existing = await db.findByCode(code)

        if (!existing) {
            // Code available
            await db.insert({ code, originalUrl })
            return code
        }

        if (existing.originalUrl === originalUrl) {
            // Same URL, reuse code
            return code
        }

        // Collision: add suffix and retry
        code = generateCode(originalUrl + Date.now() + attempts)
        attempts++
    }

    throw new Error('Could not generate unique code')
}

Step 7: Caching Strategy

With 4,000 reads/second, we need caching.

Redis to the Rescue

const Redis = require('ioredis')
const redis = new Redis()

async function getOriginalUrl(code) {
    // First look in cache
    const cached = await redis.get(`url:${code}`)
    if (cached) {
        return cached
    }

    // If not there, look in DB
    const url = await db.findByCode(code)
    if (url) {
        // Cache for 1 hour
        await redis.setex(`url:${code}`, 3600, url.originalUrl)
        return url.originalUrl
    }

    return null
}

Which URLs to cache?

Think first: Do we cache everything?

<details> <summary>See strategy</summary>

Not everything deserves caching:

Popular URLs (> 10 clicks/hour): Long cache (1 hour)
New URLs: Short cache (5 minutes)
Expired URLs: Don't cache

Implementation with LRU:

// Redis with memory limit
// maxmemory 1gb
// maxmemory-policy allkeys-lru

This automatically keeps the most accessed URLs in cache.

</details>

Step 8: Rate Limiting

Protect yourself from abuse.

const rateLimit = require('express-rate-limit')

// Limit URL creation
const createLimiter = rateLimit({
    windowMs: 15 * 60 * 1000, // 15 minutes
    max: 100, // 100 URLs per window
    message: { error: 'Too many URLs created. Try again later.' }
})

// Limit redirects (anti-bot)
const redirectLimiter = rateLimit({
    windowMs: 60 * 1000, // 1 minute
    max: 1000, // 1000 redirects
    message: 'Rate limit exceeded'
})

app.post('/api/shorten', createLimiter, shortenHandler)
app.get('/:code', redirectLimiter, redirectHandler)

Step 9: Scalability Discussion

Scenario: 1,000 requests/day

What do you need?

A basic server (1 CPU, 1GB RAM)
Local PostgreSQL
No cache necessary

Scenario: 100,000 requests/day

What changes?

<details> <summary>See answer</summary>

Add Redis for caching
Server with more resources (2 CPU, 4GB RAM)
Consider read replica for DB

</details>

Scenario: 1,000,000 requests/day

What changes?

<details> <summary>See answer</summary>

Load Balancer (Nginx or AWS ALB)
Multiple application servers
Redis Cluster for distributed cache
Read replicas for PostgreSQL
Consider CDN for static redirects

</details>

Scenario: 100,000,000 requests/day

Enterprise architecture:

<details> <summary>See answer</summary>

Database sharding by code range
Multiple geographic regions
CDN edge workers for redirects
Kafka/RabbitMQ for async analytics
Kubernetes for orchestration

</details>

Architecture Diagram

                                    ┌─────────────────┐
                                    │      CDN        │
                                    │   (CloudFlare)  │
                                    └────────┬────────┘
                                             │
                                    ┌────────▼────────┐
                                    │  Load Balancer  │
                                    │     (Nginx)     │
                                    └────────┬────────┘
                                             │
                    ┌────────────────────────┼────────────────────────┐
                    │                        │                        │
           ┌────────▼────────┐      ┌────────▼────────┐      ┌────────▼────────┐
           │   App Server    │      │   App Server    │      │   App Server    │
           │   (Node.js)     │      │   (Node.js)     │      │   (Node.js)     │
           └────────┬────────┘      └────────┬────────┘      └────────┬────────┘
                    │                        │                        │
                    └────────────────────────┼────────────────────────┘
                                             │
                    ┌────────────────────────┼────────────────────────┐
                    │                        │                        │
           ┌────────▼────────┐      ┌────────▼────────┐      ┌────────▼────────┐
           │     Redis       │      │   PostgreSQL    │      │   PostgreSQL    │
           │    (Cache)      │      │    (Primary)    │      │   (Replica)     │
           └─────────────────┘      └─────────────────┘      └─────────────────┘

Implementation: Starter Code

Project Structure

url-shortener/
├── docker-compose.yml
├── package.json
├── src/
│   ├── index.js
│   ├── routes/
│   │   └── urls.js
│   ├── services/
│   │   ├── urlService.js
│   │   └── cacheService.js
│   └── db/
│       └── postgres.js
└── .env.example

docker-compose.yml

version: '3.8'

services:
  app:
    build: .
    ports:
      - "3000:3000"
    environment:
      - DATABASE_URL=postgres://user:pass@db:5432/urlshortener
      - REDIS_URL=redis://cache:6379
    depends_on:
      - db
      - cache

  db:
    image: postgres:15
    environment:
      POSTGRES_USER: user
      POSTGRES_PASSWORD: pass
      POSTGRES_DB: urlshortener
    volumes:
      - postgres_data:/var/lib/postgresql/data

  cache:
    image: redis:7-alpine
    volumes:
      - redis_data:/data

volumes:
  postgres_data:
  redis_data:

src/index.js

const express = require('express')
const { Pool } = require('pg')
const Redis = require('ioredis')

const app = express()
app.use(express.json())

// Connections
const db = new Pool({ connectionString: process.env.DATABASE_URL })
const redis = new Redis(process.env.REDIS_URL)

// Base62 encoding
const CHARS = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
const toBase62 = (num) => {
    if (num === 0) return CHARS[0]
    let result = ''
    while (num > 0) {
        result = CHARS[num % 62] + result
        num = Math.floor(num / 62)
    }
    return result
}

// POST /api/shorten - Create short URL
app.post('/api/shorten', async (req, res) => {
    try {
        const { url } = req.body

        if (!url || !url.startsWith('http')) {
            return res.status(400).json({ error: 'Invalid URL' })
        }

        // Insert and get ID
        const result = await db.query(
            'INSERT INTO urls (original_url) VALUES ($1) RETURNING id',
            [url]
        )

        const code = toBase62(result.rows[0].id + 10000000)

        // Update with the code
        await db.query('UPDATE urls SET code = $1 WHERE id = $2', [code, result.rows[0].id])

        // Cache
        await redis.setex(`url:${code}`, 3600, url)

        res.json({
            shortUrl: `${process.env.BASE_URL || 'http://localhost:3000'}/${code}`,
            code
        })
    } catch (error) {
        console.error(error)
        res.status(500).json({ error: 'Internal error' })
    }
})

// GET /:code - Redirect
app.get('/:code', async (req, res) => {
    try {
        const { code } = req.params

        // Look in cache
        let originalUrl = await redis.get(`url:${code}`)

        if (!originalUrl) {
            // Look in DB
            const result = await db.query(
                'SELECT original_url FROM urls WHERE code = $1',
                [code]
            )

            if (result.rows.length === 0) {
                return res.status(404).json({ error: 'URL not found' })
            }

            originalUrl = result.rows[0].original_url

            // Cache for next visits
            await redis.setex(`url:${code}`, 3600, originalUrl)
        }

        // Increment counter (async, non-blocking)
        db.query('UPDATE urls SET click_count = click_count + 1 WHERE code = $1', [code])

        res.redirect(302, originalUrl)
    } catch (error) {
        console.error(error)
        res.status(500).json({ error: 'Internal error' })
    }
})

// Start server
const PORT = process.env.PORT || 3000
app.listen(PORT, () => {
    console.log(`Server running on port ${PORT}`)
})

Initialize Database

-- init.sql
CREATE TABLE IF NOT EXISTS urls (
    id BIGSERIAL PRIMARY KEY,
    code VARCHAR(10) UNIQUE,
    original_url TEXT NOT NULL,
    created_at TIMESTAMP DEFAULT NOW(),
    click_count BIGINT DEFAULT 0
);

CREATE INDEX IF NOT EXISTS idx_urls_code ON urls(code);

Exercises for You

Add expiration: Modify the schema and code so URLs expire
Analytics: Store visitor information (IP, User-Agent, referrer)
Custom URLs: Allow users to choose their own code
API Key: Implement authentication for creating URLs
Dashboard: Create an interface to view statistics

Final Reflection

Designing architecture is about trade-offs:

Simplicity vs Scalability: Start simple, scale when you need to
Consistency vs Availability: For URLs, availability matters more
Cost vs Performance: Redis costs, but saves on servers

The best architecture is one that solves today's problem without blocking tomorrow's solutions.

Ready to build yours?

Clone this code, modify it, and deploy it. The best way to learn architecture is by doing it.

Architecture Workshop

📋 Suggested prerequisites

Architecture Workshop: Design a URL Shortener

The Problem

Step 1: Requirements

Functional Requirements

Non-Functional Requirements

Step 2: Capacity Estimation

Assumptions:

Calculations

Storage (5 years)

Step 3: API Design

Endpoints

Why 301 and not 302?

Step 4: Database Schema

Option 1: SQL (PostgreSQL)

Option 2: NoSQL (MongoDB)

Which would you choose?

Step 5: Encoding Algorithm

Option A: Base62 with ID

Option B: Hash + Truncation

Which has fewer collisions?

Step 6: Handling Collisions

Step 7: Caching Strategy

Redis to the Rescue

Which URLs to cache?

Step 8: Rate Limiting

Step 9: Scalability Discussion

Scenario: 1,000 requests/day

Scenario: 100,000 requests/day

Scenario: 1,000,000 requests/day

Scenario: 100,000,000 requests/day

Architecture Diagram

Implementation: Starter Code

Project Structure

docker-compose.yml

src/index.js

Initialize Database

Exercises for You

Final Reflection