What you'll build

A web scraper with Python that extracts data from websites automatically. Uses requests and BeautifulSoup to make HTTP requests, parse HTML, and save data to files or JSON. Very useful for research and data collection.

Installation

uv add requests beautifulsoup4
# or
pip install requests beautifulsoup4

Step 1: Ask an AI for the scraper

I need a web scraper in Python that:
- Uses requests and BeautifulSoup
- Extracts titles and links from a page
- Handles connection errors
- Saves results to JSON
- Has delay between requests (be respectful)
- Includes appropriate user-agent

Give me the complete code.

Typical code

import requests
from bs4 import BeautifulSoup
import json
import time

def scrape_page(url: str) -> list[dict]:
    headers = {
        'User-Agent': 'Mozilla/5.0 (educational scraper)'
    }

    try:
        response = requests.get(url, headers=headers, timeout=10)
        response.raise_for_status()
    except requests.RequestException as e:
        print(f"Error: {e}")
        return []

    soup = BeautifulSoup(response.text, 'html.parser')

    results = []
    for link in soup.find_all('a', href=True):
        results.append({
            'text': link.get_text(strip=True),
            'href': link['href']
        })

    return results

# Usage
data = scrape_page('https://example.com')
with open('results.json', 'w') as f:
    json.dump(data, f, indent=2)

Best practices

Practice	Why
Delay between requests	Don't overload server
User-Agent	Identify yourself
robots.txt	Respect site rules
Error handling	Handle timeouts and errors

Scraping ethics

⚠️ Always check the site's terms of service. Some prohibit scraping.

Next level

→ Public AI Chat with Auth — Chef Level

Basic Web Scraper

📋 Suggested prerequisites