๐Ÿ—๏ธ

Architecture Workshop

๐Ÿ‘จโ€๐Ÿณ Chefโฑ๏ธ 90 minutes

๐Ÿ“‹ Suggested prerequisites

  • โ€ขBasic backend knowledge
  • โ€ขUnderstanding of REST APIs
  • โ€ขFamiliarity with databases

Architecture Workshop: Design a URL Shortener

Welcome to this hands-on workshop where you'll design the architecture of a real system.

We're not just reading theory. We're going to think, decide, and build.


The Problem

Your client wants a service like bit.ly:

  • Users input a long URL
  • They receive a short URL (e.g., luxia.us/abc123)
  • Visiting the short URL redirects to the original

Sounds simple, right? Let's see how deep the rabbit hole goes.


Step 1: Requirements

Before writing code, we need to understand what we're building.

Functional Requirements

Think first: What should the user be able to do?

<details> <summary>See answer</summary>
  1. Create a short URL from a long one
  2. Automatically redirect when visiting the short URL
  3. (Optional) View click statistics
  4. (Optional) Custom URLs (e.g., luxia.us/my-link)
  5. (Optional) URL expiration
</details>

Non-Functional Requirements

Think first: What performance characteristics do we need?

<details> <summary>See answer</summary>
  1. High availability: The service must always be up
  2. Low latency: Redirects in < 100ms
  3. Scalable: Millions of URLs without degradation
  4. Durability: URLs must not be lost
</details>

Step 2: Capacity Estimation

This is the part that separates junior from senior engineers.

Assumptions:

  • 100 million URLs created per month
  • Read:write ratio of 100:1 (100 reads per URL created)

Calculations

URLs per second (writes):

100M / 30 days / 24 hours / 3600 seconds โ‰ˆ 40 URLs/second

Reads per second:

40 ร— 100 = 4,000 reads/second

Storage (5 years)

Think first: How much space do we need?

<details> <summary>See calculation</summary>
  • URLs per month: 100M
  • URLs in 5 years: 100M ร— 12 ร— 5 = 6 billion
  • Average size per record: ~500 bytes (URL + metadata)
  • Total: 6B ร— 500B = 3TB

Not that much. A modern disk handles it.

</details>

Step 3: API Design

Let's keep it simple with REST.

Endpoints

POST /api/shorten
Body: { "url": "https://example.com/very-long-page" }
Response: { "shortUrl": "https://luxia.us/abc123", "code": "abc123" }

GET /:code
Response: 301 Redirect to original URL

Why 301 and not 302?

Think first: What's the difference?

<details> <summary>See answer</summary>
  • 301 (Moved Permanently): Browser caches the redirect
  • 302 (Found): Every visit goes through our server

For a URL shortener, 302 is better if we want to:

  • Count every visit
  • Be able to change the destination URL
  • Add expiration

For maximum performance (less load): 301

</details>

Step 4: Database Schema

Option 1: SQL (PostgreSQL)

CREATE TABLE urls (
    id BIGSERIAL PRIMARY KEY,
    code VARCHAR(10) UNIQUE NOT NULL,
    original_url TEXT NOT NULL,
    created_at TIMESTAMP DEFAULT NOW(),
    expires_at TIMESTAMP,
    click_count BIGINT DEFAULT 0
);

CREATE INDEX idx_urls_code ON urls(code);

Option 2: NoSQL (MongoDB)

{
    _id: ObjectId,
    code: "abc123",
    originalUrl: "https://...",
    createdAt: ISODate,
    expiresAt: ISODate,
    clicks: 0
}

Which would you choose?

Think first: What factors would you consider?

<details> <summary>See analysis</summary>

For this case, SQL is probably better:

  1. Simple and fixed schema
  2. We need transactions to avoid collisions
  3. Queries are simple (only by code)
  4. PostgreSQL scales very well up to millions of records

NoSQL would be better if:

  • Variable schema per URL
  • You need horizontal sharding from the start
  • You have experience with MongoDB
</details>

Step 5: Encoding Algorithm

The heart of the system: how do we generate the short code?

Option A: Base62 with ID

const CHARS = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'

function toBase62(num) {
    if (num === 0) return CHARS[0]
    let result = ''
    while (num > 0) {
        result = CHARS[num % 62] + result
        num = Math.floor(num / 62)
    }
    return result
}

// Example:
// toBase62(1) = "1"
// toBase62(62) = "10"
// toBase62(1000000) = "4c92"

With 6 characters: 62^6 = 56 billion possible URLs

Option B: Hash + Truncation

const crypto = require('crypto')

function generateCode(url) {
    const hash = crypto.createHash('md5').update(url).digest('base64')
    return hash.substring(0, 7).replace(/[+/=]/g, 'x')
}

Which has fewer collisions?

Think first: Which would you choose and why?

<details> <summary>See analysis</summary>

Base62 with auto-incremental ID:

  • โœ… Zero collisions (each ID is unique)
  • โœ… Predictable and efficient
  • โŒ Reveals how many URLs exist (ID 1000 = thousand URLs)
  • โŒ Easy to enumerate URLs

Truncated hash:

  • โœ… Doesn't reveal system information
  • โœ… Same input = same output (useful for deduplication)
  • โŒ Possible collisions (must verify)
  • โŒ More processing

Hybrid solution: Base62 with ID + random offset

const OFFSET = 100000000 // Large random number
const code = toBase62(id + OFFSET)
</details>

Step 6: Handling Collisions

If we use hashing, we need to handle collisions.

async function createShortUrl(originalUrl) {
    let code = generateCode(originalUrl)
    let attempts = 0

    while (attempts < 5) {
        const existing = await db.findByCode(code)

        if (!existing) {
            // Code available
            await db.insert({ code, originalUrl })
            return code
        }

        if (existing.originalUrl === originalUrl) {
            // Same URL, reuse code
            return code
        }

        // Collision: add suffix and retry
        code = generateCode(originalUrl + Date.now() + attempts)
        attempts++
    }

    throw new Error('Could not generate unique code')
}

Step 7: Caching Strategy

With 4,000 reads/second, we need caching.

Redis to the Rescue

const Redis = require('ioredis')
const redis = new Redis()

async function getOriginalUrl(code) {
    // First look in cache
    const cached = await redis.get(`url:${code}`)
    if (cached) {
        return cached
    }

    // If not there, look in DB
    const url = await db.findByCode(code)
    if (url) {
        // Cache for 1 hour
        await redis.setex(`url:${code}`, 3600, url.originalUrl)
        return url.originalUrl
    }

    return null
}

Which URLs to cache?

Think first: Do we cache everything?

<details> <summary>See strategy</summary>

Not everything deserves caching:

  1. Popular URLs (> 10 clicks/hour): Long cache (1 hour)
  2. New URLs: Short cache (5 minutes)
  3. Expired URLs: Don't cache

Implementation with LRU:

// Redis with memory limit
// maxmemory 1gb
// maxmemory-policy allkeys-lru

This automatically keeps the most accessed URLs in cache.

</details>

Step 8: Rate Limiting

Protect yourself from abuse.

const rateLimit = require('express-rate-limit')

// Limit URL creation
const createLimiter = rateLimit({
    windowMs: 15 * 60 * 1000, // 15 minutes
    max: 100, // 100 URLs per window
    message: { error: 'Too many URLs created. Try again later.' }
})

// Limit redirects (anti-bot)
const redirectLimiter = rateLimit({
    windowMs: 60 * 1000, // 1 minute
    max: 1000, // 1000 redirects
    message: 'Rate limit exceeded'
})

app.post('/api/shorten', createLimiter, shortenHandler)
app.get('/:code', redirectLimiter, redirectHandler)

Step 9: Scalability Discussion

Scenario: 1,000 requests/day

What do you need?

  • A basic server (1 CPU, 1GB RAM)
  • Local PostgreSQL
  • No cache necessary

Scenario: 100,000 requests/day

What changes?

<details> <summary>See answer</summary>
  • Add Redis for caching
  • Server with more resources (2 CPU, 4GB RAM)
  • Consider read replica for DB
</details>

Scenario: 1,000,000 requests/day

What changes?

<details> <summary>See answer</summary>
  • Load Balancer (Nginx or AWS ALB)
  • Multiple application servers
  • Redis Cluster for distributed cache
  • Read replicas for PostgreSQL
  • Consider CDN for static redirects
</details>

Scenario: 100,000,000 requests/day

Enterprise architecture:

<details> <summary>See answer</summary>
  • Database sharding by code range
  • Multiple geographic regions
  • CDN edge workers for redirects
  • Kafka/RabbitMQ for async analytics
  • Kubernetes for orchestration
</details>

Architecture Diagram

                                    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                                    โ”‚      CDN        โ”‚
                                    โ”‚   (CloudFlare)  โ”‚
                                    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                             โ”‚
                                    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                                    โ”‚  Load Balancer  โ”‚
                                    โ”‚     (Nginx)     โ”‚
                                    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                             โ”‚
                    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                    โ”‚                        โ”‚                        โ”‚
           โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
           โ”‚   App Server    โ”‚      โ”‚   App Server    โ”‚      โ”‚   App Server    โ”‚
           โ”‚   (Node.js)     โ”‚      โ”‚   (Node.js)     โ”‚      โ”‚   (Node.js)     โ”‚
           โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                    โ”‚                        โ”‚                        โ”‚
                    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                             โ”‚
                    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                    โ”‚                        โ”‚                        โ”‚
           โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
           โ”‚     Redis       โ”‚      โ”‚   PostgreSQL    โ”‚      โ”‚   PostgreSQL    โ”‚
           โ”‚    (Cache)      โ”‚      โ”‚    (Primary)    โ”‚      โ”‚   (Replica)     โ”‚
           โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Implementation: Starter Code

Project Structure

url-shortener/
โ”œโ”€โ”€ docker-compose.yml
โ”œโ”€โ”€ package.json
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ index.js
โ”‚   โ”œโ”€โ”€ routes/
โ”‚   โ”‚   โ””โ”€โ”€ urls.js
โ”‚   โ”œโ”€โ”€ services/
โ”‚   โ”‚   โ”œโ”€โ”€ urlService.js
โ”‚   โ”‚   โ””โ”€โ”€ cacheService.js
โ”‚   โ””โ”€โ”€ db/
โ”‚       โ””โ”€โ”€ postgres.js
โ””โ”€โ”€ .env.example

docker-compose.yml

version: '3.8'

services:
  app:
    build: .
    ports:
      - "3000:3000"
    environment:
      - DATABASE_URL=postgres://user:pass@db:5432/urlshortener
      - REDIS_URL=redis://cache:6379
    depends_on:
      - db
      - cache

  db:
    image: postgres:15
    environment:
      POSTGRES_USER: user
      POSTGRES_PASSWORD: pass
      POSTGRES_DB: urlshortener
    volumes:
      - postgres_data:/var/lib/postgresql/data

  cache:
    image: redis:7-alpine
    volumes:
      - redis_data:/data

volumes:
  postgres_data:
  redis_data:

src/index.js

const express = require('express')
const { Pool } = require('pg')
const Redis = require('ioredis')

const app = express()
app.use(express.json())

// Connections
const db = new Pool({ connectionString: process.env.DATABASE_URL })
const redis = new Redis(process.env.REDIS_URL)

// Base62 encoding
const CHARS = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
const toBase62 = (num) => {
    if (num === 0) return CHARS[0]
    let result = ''
    while (num > 0) {
        result = CHARS[num % 62] + result
        num = Math.floor(num / 62)
    }
    return result
}

// POST /api/shorten - Create short URL
app.post('/api/shorten', async (req, res) => {
    try {
        const { url } = req.body

        if (!url || !url.startsWith('http')) {
            return res.status(400).json({ error: 'Invalid URL' })
        }

        // Insert and get ID
        const result = await db.query(
            'INSERT INTO urls (original_url) VALUES ($1) RETURNING id',
            [url]
        )

        const code = toBase62(result.rows[0].id + 10000000)

        // Update with the code
        await db.query('UPDATE urls SET code = $1 WHERE id = $2', [code, result.rows[0].id])

        // Cache
        await redis.setex(`url:${code}`, 3600, url)

        res.json({
            shortUrl: `${process.env.BASE_URL || 'http://localhost:3000'}/${code}`,
            code
        })
    } catch (error) {
        console.error(error)
        res.status(500).json({ error: 'Internal error' })
    }
})

// GET /:code - Redirect
app.get('/:code', async (req, res) => {
    try {
        const { code } = req.params

        // Look in cache
        let originalUrl = await redis.get(`url:${code}`)

        if (!originalUrl) {
            // Look in DB
            const result = await db.query(
                'SELECT original_url FROM urls WHERE code = $1',
                [code]
            )

            if (result.rows.length === 0) {
                return res.status(404).json({ error: 'URL not found' })
            }

            originalUrl = result.rows[0].original_url

            // Cache for next visits
            await redis.setex(`url:${code}`, 3600, originalUrl)
        }

        // Increment counter (async, non-blocking)
        db.query('UPDATE urls SET click_count = click_count + 1 WHERE code = $1', [code])

        res.redirect(302, originalUrl)
    } catch (error) {
        console.error(error)
        res.status(500).json({ error: 'Internal error' })
    }
})

// Start server
const PORT = process.env.PORT || 3000
app.listen(PORT, () => {
    console.log(`Server running on port ${PORT}`)
})

Initialize Database

-- init.sql
CREATE TABLE IF NOT EXISTS urls (
    id BIGSERIAL PRIMARY KEY,
    code VARCHAR(10) UNIQUE,
    original_url TEXT NOT NULL,
    created_at TIMESTAMP DEFAULT NOW(),
    click_count BIGINT DEFAULT 0
);

CREATE INDEX IF NOT EXISTS idx_urls_code ON urls(code);

Exercises for You

  1. Add expiration: Modify the schema and code so URLs expire
  2. Analytics: Store visitor information (IP, User-Agent, referrer)
  3. Custom URLs: Allow users to choose their own code
  4. API Key: Implement authentication for creating URLs
  5. Dashboard: Create an interface to view statistics

Final Reflection

Designing architecture is about trade-offs:

  • Simplicity vs Scalability: Start simple, scale when you need to
  • Consistency vs Availability: For URLs, availability matters more
  • Cost vs Performance: Redis costs, but saves on servers

The best architecture is one that solves today's problem without blocking tomorrow's solutions.


Ready to build yours?

Clone this code, modify it, and deploy it. The best way to learn architecture is by doing it.