Skip to content

Examples

Learn by example! This section contains working scripts demonstrating various web scraping techniques.

Available Examples

Simple Navigation

Basic example showing page navigation and data extraction.

What you'll learn: - Navigating to a URL - Waiting for page load - Extracting page title and heading - Using scrape_data()

Navigate to 9to5Linux and click on a link containing specific text.

What you'll learn: - Finding elements by text content - Clicking links - Handling navigation after clicks - Conditional element checks

Manual Screenshots

Take screenshots at specific points in your workflow.

What you'll learn: - Using capture_screenshot() - Screenshot before/after actions - Debugging with screenshots - Error state documentation

Debug Logging

Track script execution with detailed logging.

What you'll learn: - Using debug_log() - Tracking progress - Identifying where scripts hang - Best practices for logging

Downloading Files

Download files from web pages during script execution.

What you'll learn: - Using download_file() function - Downloading from URLs - Downloading by clicking links - Downloading multiple files - Error handling for downloads - Combining downloads with data scraping

Importing Data

Use imported data to make scripts reusable with different inputs.

What you'll learn: - Using the Import Data feature - Accessing imported_data variable - Parameterizing scripts - Processing multiple URLs - Configuration via imported data - Error handling with imported data

Running Examples

All examples are located in the /examples directory of the repository.

To run an example:

  1. Go to the Editor
  2. Copy the example code
  3. Paste into the editor
  4. Click "Run"

Example Template

Use this template as a starting point for your scripts:

import datetime

async def main(page):
    """
    Description of what this script does
    """

    # Step 1: Navigate
    debug_log("Starting navigation")
    await page.goto('https://example.com')
    await page.wait_for_load_state('networkidle')

    # Step 2: Interact
    debug_log("Looking for elements")
    await capture_screenshot("Initial state")

    # Your scraping logic here

    # Step 3: Extract data
    debug_log("Extracting data")
    scrape_data({
        'scraped_at': datetime.datetime.now().isoformat(),
        # Your data here
    })

    debug_log("Complete!")

Common Use Cases

E-commerce Price Monitoring

async def main(page):
    await page.goto('https://shop.example.com/product/123')

    price = await page.locator('.price').text_content()
    title = await page.locator('h1.product-title').text_content()

    scrape_data({
        'product': title,
        'price': price,
        'url': page.url
    })

Form Submission

async def main(page):
    await page.goto('https://example.com/search')

    # Fill and submit form
    await page.locator('input[name="q"]').fill('search term')
    await page.locator('button[type="submit"]').click()

    # Wait for results
    await page.wait_for_selector('.search-results')
    await capture_screenshot("Search results")

    # Extract results
    results = await page.locator('.result-item').all()
    for result in results:
        title = await result.locator('.title').text_content()
        scrape_data({'result': title})

Login and Navigate

async def main(page):
    # Login
    await page.goto('https://example.com/login')
    await page.locator('#username').fill('user@example.com')
    await page.locator('#password').fill('password')
    await page.locator('button[type="submit"]').click()

    # Wait for dashboard
    await page.wait_for_url('**/dashboard')
    await capture_screenshot("Logged in")

    # Navigate to data page
    await page.goto('https://example.com/data')
    # ... extract data

Pagination

async def main(page):
    await page.goto('https://example.com/listings')

    page_num = 1
    while True:
        debug_log(f"Processing page {page_num}")

        # Extract items on current page
        items = await page.locator('.item').all()
        for item in items:
            title = await item.locator('.title').text_content()
            scrape_data({'title': title, 'page': page_num})

        # Check for next button
        next_button = page.locator('a.next-page')
        if await next_button.count() == 0:
            break

        await next_button.click()
        await page.wait_for_load_state('networkidle')
        page_num += 1

Tips for Writing Examples

  1. Add comments - Explain what each section does
  2. Use debug_log() - Make execution flow clear
  3. Take screenshots - Capture important states
  4. Handle errors - Use try/except for robustness
  5. Extract meaningful data - Show real-world use cases

Contributing Examples

Have a useful script? Share it with the community!

See Contributing for guidelines.