Skip to content

Importing Data

Import external data into your scripts to make them reusable and parameterizable. Run the same script with different inputs without editing code.

Quick Start

1. Click "Import Data"

In the editor toolbar, click the "Import Data" button.

2. Paste JSON Data

A modal will appear. Paste your JSON data:

{
  "urls": [
    "https://example.com/page1",
    "https://example.com/page2",
    "https://example.com/page3"
  ],
  "max_items": 10
}

3. Click Import

Click the "Import" button. The modal will close and the "Import Data" button will turn green with a checkmark (✓).

4. Use in Your Script

Access the data via the imported_data variable:

async def main(page):
    if imported_data:
        for url in imported_data['urls']:
            await page.goto(url)
            # ... scrape data

The imported_data Variable

Type: dict | list | str | int | float | bool | None

  • None if no data has been imported
  • Otherwise, contains the parsed JSON data you imported

Always check if data exists:

async def main(page):
    if imported_data:
        debug_log("Using imported data")
        # Use imported data
    else:
        debug_log("No data imported, using defaults")
        # Use default values

Common Use Cases

Multiple URLs

Scrape a list of URLs without hardcoding them:

Imported Data:

{
  "urls": [
    "https://news.ycombinator.com",
    "https://reddit.com/r/python",
    "https://github.com/trending"
  ]
}

Script:

async def main(page):
    if not imported_data:
        debug_log("No URLs provided")
        return

    results = []
    for url in imported_data['urls']:
        debug_log(f"Scraping: {url}")
        await page.goto(url)
        await page.wait_for_load_state('networkidle')

        title = await page.title()
        results.append({'url': url, 'title': title})

    scrape_data({'results': results})

Search Queries

Run multiple searches with different parameters:

Imported Data:

{
  "queries": [
    "playwright python",
    "web scraping tutorial",
    "selenium alternatives"
  ],
  "search_engine": "google"
}

Script:

async def main(page):
    if not imported_data:
        debug_log("No search queries provided")
        return

    await page.goto('https://www.google.com')

    for query in imported_data['queries']:
        debug_log(f"Searching for: {query}")

        # Fill search box
        await page.fill('input[name="q"]', query)
        await page.press('input[name="q"]', 'Enter')
        await page.wait_for_load_state('networkidle')

        # Get result count
        results_text = await page.locator('#result-stats').text_content()

        scrape_data({
            'query': query,
            'results_text': results_text
        })

        # Go back for next search
        await page.goto('https://www.google.com')

Configuration Parameters

Parameterize scraping behavior:

Imported Data:

{
  "target_url": "https://example.com/products",
  "max_pages": 5,
  "wait_time": 2,
  "capture_screenshots": true,
  "filters": {
    "category": "electronics",
    "min_price": 100
  }
}

Script:

async def main(page):
    if not imported_data:
        config = {
            'target_url': 'https://example.com',
            'max_pages': 1,
            'wait_time': 1,
            'capture_screenshots': False
        }
    else:
        config = imported_data

    debug_log(f"Config: {config}")

    await page.goto(config['target_url'])

    for page_num in range(config['max_pages']):
        debug_log(f"Processing page {page_num + 1}")

        if config.get('capture_screenshots'):
            await capture_screenshot(f"Page {page_num + 1}")

        # ... scrape data

        await asyncio.sleep(config['wait_time'])

Product IDs

Scrape specific products:

Imported Data:

{
  "product_ids": ["12345", "67890", "11111"],
  "store": "example-store",
  "include_reviews": true
}

Script:

async def main(page):
    if not imported_data:
        debug_log("No product IDs provided")
        return

    base_url = f"https://{imported_data['store']}.com/product/"

    for product_id in imported_data['product_ids']:
        url = f"{base_url}{product_id}"
        debug_log(f"Scraping product: {product_id}")

        await page.goto(url)
        await page.wait_for_load_state('networkidle')

        # Get product details
        name = await page.locator('.product-name').text_content()
        price = await page.locator('.product-price').text_content()

        product_data = {
            'id': product_id,
            'name': name,
            'price': price
        }

        # Get reviews if requested
        if imported_data.get('include_reviews'):
            reviews = await page.locator('.review').all()
            product_data['review_count'] = len(reviews)

        scrape_data(product_data)

Data Persistence

Session-Based

Imported data persists during your editor session:

  • ✅ Stays loaded when you run the script multiple times
  • ✅ Remains even if you edit the script
  • ❌ Clears when you close the browser tab
  • ❌ Clears when you refresh the page

Clearing Data

To clear imported data:

  1. Click "Import Data" button
  2. Delete all text in the textarea
  3. Click "Import"
  4. The button will return to gray (no checkmark)

Updating Data

To update imported data:

  1. Click "Import Data" button
  2. The current data will be shown
  3. Edit the JSON
  4. Click "Import"

Best Practices

1. Always Validate

Check if data exists and has expected structure:

async def main(page):
    # Check if data exists
    if not imported_data:
        debug_log("ERROR: No data imported")
        return

    # Validate required fields
    if 'urls' not in imported_data:
        debug_log("ERROR: Missing 'urls' field")
        return

    if not isinstance(imported_data['urls'], list):
        debug_log("ERROR: 'urls' must be a list")
        return

    # Proceed with scraping
    for url in imported_data['urls']:
        await page.goto(url)
        # ...

2. Provide Defaults

Make scripts work with or without imported data:

async def main(page):
    # Use imported data or defaults
    urls = imported_data.get('urls', ['https://example.com']) if imported_data else ['https://example.com']
    max_items = imported_data.get('max_items', 10) if imported_data else 10

    debug_log(f"Processing {len(urls)} URLs, max {max_items} items each")
    # ...

3. Log What You're Using

Help with debugging by logging imported data:

async def main(page):
    if imported_data:
        debug_log(f"Imported data: {imported_data}")
    else:
        debug_log("No data imported, using defaults")

4. Use Type Checking

Handle different data types safely:

async def main(page):
    if not imported_data:
        return

    # Handle list of URLs
    if isinstance(imported_data, list):
        urls = imported_data
    # Handle dict with URLs array
    elif isinstance(imported_data, dict) and 'urls' in imported_data:
        urls = imported_data['urls']
    # Handle single URL string
    elif isinstance(imported_data, str):
        urls = [imported_data]
    else:
        debug_log("ERROR: Unexpected data format")
        return

    for url in urls:
        await page.goto(url)
        # ...

JSON Format

Valid JSON Examples

Object:

{"key": "value", "number": 42}

Array:

["item1", "item2", "item3"]

Nested:

{
  "users": [
    {"name": "Alice", "age": 30},
    {"name": "Bob", "age": 25}
  ],
  "settings": {
    "enabled": true,
    "count": 10
  }
}

Primitive:

"just a string"
42
true

Common JSON Errors

Trailing commas:

{
  "key": "value",   Extra comma
}

Single quotes:

{'key': 'value'}   Must use double quotes

Unquoted keys:

{key: "value"}   Keys must be quoted

Comments:

{
  // This is invalid  ❌ No comments in JSON
  "key": "value"
}

Troubleshooting

"Invalid JSON" Error

Problem: The modal shows "Invalid JSON" when you try to import.

Solution: - Check for trailing commas - Use double quotes, not single quotes - Validate your JSON at jsonlint.com

Data Not Available in Script

Problem: imported_data is None even though you imported data.

Solution: - Check that the Import Data button shows a green checkmark (✓) - Try importing the data again - Check browser console for errors

Script Works in Editor but Fails in Execution History

Problem: Script works when you run it, but viewing past executions shows different behavior.

Explanation: Imported data is session-based and not stored with execution history. If you want to track what data was used, include it in scrape_data():

async def main(page):
    if imported_data:
        # Store imported data in execution record
        scrape_data({
            'imported_data': imported_data,
            'results': []  # Your actual results
        })

See Also