Importing Data¶

Import external data into your scripts to make them reusable and parameterizable. Run the same script with different inputs without editing code.

Quick Start¶

1. Click "Import Data"¶

In the editor toolbar, click the "Import Data" button.

2. Paste JSON Data¶

A modal will appear. Paste your JSON data:

{
  "urls": [
    "https://example.com/page1",
    "https://example.com/page2",
    "https://example.com/page3"
  ],
  "max_items": 10
}

3. Click Import¶

Click the "Import" button. The modal will close and the "Import Data" button will turn green with a checkmark (✓).

4. Use in Your Script¶

Access the data via the imported_data variable:

async def main(page):
    if imported_data:
        for url in imported_data['urls']:
            await page.goto(url)
            # ... scrape data

The `imported_data` Variable¶

None if no data has been imported
Otherwise, contains the parsed JSON data you imported

Always check if data exists:

async def main(page):
    if imported_data:
        debug_log("Using imported data")
        # Use imported data
    else:
        debug_log("No data imported, using defaults")
        # Use default values

Common Use Cases¶

Multiple URLs¶

Scrape a list of URLs without hardcoding them:

Imported Data:

{
  "urls": [
    "https://news.ycombinator.com",
    "https://reddit.com/r/python",
    "https://github.com/trending"
  ]
}

Script:

async def main(page):
    if not imported_data:
        debug_log("No URLs provided")
        return

    results = []
    for url in imported_data['urls']:
        debug_log(f"Scraping: {url}")
        await page.goto(url)
        await page.wait_for_load_state('networkidle')

        title = await page.title()
        results.append({'url': url, 'title': title})

    scrape_data({'results': results})

Search Queries¶

Run multiple searches with different parameters:

Imported Data:

{
  "queries": [
    "playwright python",
    "web scraping tutorial",
    "selenium alternatives"
  ],
  "search_engine": "google"
}

Script:

name="__codelineno-6-1" href="#__codelineno-6-1">async def main(page): if not imported_data: debug_log("No search queries provided") return await page.goto('https://www.google.com') for query in imported_data['queries']: debug_log(f"Searching for: {query}") # Fill search box await page.fill('input[name="q"]', query) await page.press('input[name="q"]', 'Enter') await page.wait_for_load_state('networkidle') # Get result count results_text = await page.locator('#result-stats').text_content() scrape_data({ 'query': query, 'results_text': results_text }) # Go back for next search await page.goto('https://www.google.com')

Configuration Parameters¶

Parameterize scraping behavior:

Imported Data:

{
  "target_url": "https://example.com/products",
  "max_pages": 5,
  "wait_time": 2,
  "capture_screenshots": true,
  "filters": {
    "category": "electronics",
    "min_price": 100
  }
}

Script:

async def main(page):
    if not imported_data:
        config = {
            'target_url': 'https://example.com',
            'max_pages': 1,
            'wait_time': 1,
            'capture_screenshots': False
        }
    else:
        config = imported_data

    debug_log(f"Config: {config}")

    await page.goto(config['target_url'])

    for page_num in range(config['max_pages']):
        debug_log(f"Processing page {page_num + 1}")

        if config.get('capture_screenshots'):
            await capture_screenshot(f"Page {page_num + 1}")

        # ... scrape data

        await asyncio.sleep(config['wait_time'])

Product IDs¶

Scrape specific products:

Imported Data:

{
  "product_ids": ["12345", "67890", "11111"],
  "store": "example-store",
  "include_reviews": true
}

Script:

async def main(page):
    if not imported_data:
        debug_log("No product IDs provided")
        return

    base_url = f"https://{imported_data['store']}.com/product/"

    for product_id in imported_data['product_ids']:
        url = f"{base_url}{product_id}"
        debug_log(f"Scraping product: {product_id}")

        await page.goto(url)
        await page.wait_for_load_state('networkidle')

        # Get product details
        name = await page.locator('.product-name').text_content()
        price = await page.locator('.product-price').text_content()

        product_data = {
            'id': product_id,
            'name': name,
            'price': price
        }

        # Get reviews if requested
        if imported_data.get('include_reviews'):
            reviews = await page.locator('.review').all()
            product_data['review_count'] = len(reviews)

        scrape_data(product_data)

Data Persistence¶

Session-Based¶

Imported data persists during your editor session:

✅ Stays loaded when you run the script multiple times
✅ Remains even if you edit the script
❌ Clears when you close the browser tab
❌ Clears when you refresh the page

Clearing Data¶

To clear imported data:

Click "Import Data" button
Delete all text in the textarea
Click "Import"
The button will return to gray (no checkmark)

Updating Data¶

To update imported data:

Click "Import Data" button
The current data will be shown
Edit the JSON
Click "Import"

Best Practices¶

1. Always Validate¶

Check if data exists and has expected structure:

async def main(page):
    # Check if data exists
    if not imported_data:
        debug_log("ERROR: No data imported")
        return

    # Validate required fields
    if 'urls' not in imported_data:
        debug_log("ERROR: Missing 'urls' field")
        return

    if not isinstance(imported_data['urls'], list):
        debug_log("ERROR: 'urls' must be a list")
        return

    # Proceed with scraping
    for url in imported_data['urls']:
        await page.goto(url)
        # ...

2. Provide Defaults¶

Make scripts work with or without imported data:

async def main(page):
    # Use imported data or defaults
    urls = imported_data.get('urls', ['https://example.com']) if imported_data else ['https://example.com']
    max_items = imported_data.get('max_items', 10) if imported_data else 10

    debug_log(f"Processing {len(urls)} URLs, max {max_items} items each")
    # ...

3. Log What You're Using¶

Help with debugging by logging imported data:

async def main(page):
    if imported_data:
        debug_log(f"Imported data: {imported_data}")
    else:
        debug_log("No data imported, using defaults")

4. Use Type Checking¶

Handle different data types safely:

async def main(page):
    if not imported_data:
        return

    # Handle list of URLs
    if isinstance(imported_data, list):
        urls = imported_data
    # Handle dict with URLs array
    elif isinstance(imported_data, dict) and 'urls' in imported_data:
        urls = imported_data['urls']
    # Handle single URL string
    elif isinstance(imported_data, str):
        urls = [imported_data]
    else:
        debug_log("ERROR: Unexpected data format")
        return

    for url in urls:
        await page.goto(url)
        # ...

JSON Format¶

Valid JSON Examples¶

Object:

{"key": "value", "number": 42}

Array:

["item1", "item2", "item3"]

Nested:

{
  "users": [
    {"name": "Alice", "age": 30},
    {"name": "Bob", "age": 25}
  ],
  "settings": {
    "enabled": true,
    "count": 10
  }
}

Primitive:

"just a string"

true

Common JSON Errors¶

Trailing commas:

{
  "key": "value",  ❌ Extra comma
}

Single quotes:

{'key': 'value'}  ❌ Must use double quotes

Unquoted keys:

{key: "value"}  ❌ Keys must be quoted

Comments:

{
  // This is invalid  ❌ No comments in JSON
  "key": "value"
}

Troubleshooting¶

"Invalid JSON" Error¶

Problem: The modal shows "Invalid JSON" when you try to import.

Solution: - Check for trailing commas - Use double quotes, not single quotes - Validate your JSON at jsonlint.com

Data Not Available in Script¶

Problem: imported_data is None even though you imported data.

Solution: - Check that the Import Data button shows a green checkmark (✓) - Try importing the data again - Check browser console for errors

Script Works in Editor but Fails in Execution History¶

Problem: Script works when you run it, but viewing past executions shows different behavior.

Explanation: Imported data is session-based and not stored with execution history. If you want to track what data was used, include it in scrape_data():

async def main(page):
    if imported_data:
        # Store imported data in execution record
        scrape_data({
            'imported_data': imported_data,
            'results': []  # Your actual results
        })

Importing Data¶

Quick Start¶

1. Click "Import Data"¶

2. Paste JSON Data¶

3. Click Import¶

4. Use in Your Script¶

The imported_data Variable¶

Common Use Cases¶

Multiple URLs¶

Search Queries¶

Configuration Parameters¶

Product IDs¶

Data Persistence¶

Session-Based¶

Clearing Data¶

Updating Data¶

Best Practices¶

1. Always Validate¶

2. Provide Defaults¶

3. Log What You're Using¶

4. Use Type Checking¶

JSON Format¶

Valid JSON Examples¶

Common JSON Errors¶

Troubleshooting¶

"Invalid JSON" Error¶

Data Not Available in Script¶

Script Works in Editor but Fails in Execution History¶

See Also¶

The `imported_data` Variable¶