Importing Data¶
Import external data into your scripts to make them reusable and parameterizable. Run the same script with different inputs without editing code.
Quick Start¶
1. Click "Import Data"¶
In the editor toolbar, click the "Import Data" button.
2. Paste JSON Data¶
A modal will appear. Paste your JSON data:
{
"urls": [
"https://example.com/page1",
"https://example.com/page2",
"https://example.com/page3"
],
"max_items": 10
}
3. Click Import¶
Click the "Import" button. The modal will close and the "Import Data" button will turn green with a checkmark (✓).
4. Use in Your Script¶
Access the data via the imported_data variable:
async def main(page):
if imported_data:
for url in imported_data['urls']:
await page.goto(url)
# ... scrape data
The imported_data Variable¶
Type: dict | list | str | int | float | bool | None
Noneif no data has been imported- Otherwise, contains the parsed JSON data you imported
Always check if data exists:
async def main(page):
if imported_data:
debug_log("Using imported data")
# Use imported data
else:
debug_log("No data imported, using defaults")
# Use default values
Common Use Cases¶
Multiple URLs¶
Scrape a list of URLs without hardcoding them:
Imported Data:
{
"urls": [
"https://news.ycombinator.com",
"https://reddit.com/r/python",
"https://github.com/trending"
]
}
Script:
async def main(page):
if not imported_data:
debug_log("No URLs provided")
return
results = []
for url in imported_data['urls']:
debug_log(f"Scraping: {url}")
await page.goto(url)
await page.wait_for_load_state('networkidle')
title = await page.title()
results.append({'url': url, 'title': title})
scrape_data({'results': results})
Search Queries¶
Run multiple searches with different parameters:
Imported Data:
{
"queries": [
"playwright python",
"web scraping tutorial",
"selenium alternatives"
],
"search_engine": "google"
}
Script:
async def main(page):
if not imported_data:
debug_log("No search queries provided")
return
await page.goto('https://www.google.com')
for query in imported_data['queries']:
debug_log(f"Searching for: {query}")
# Fill search box
await page.fill('input[name="q"]', query)
await page.press('input[name="q"]', 'Enter')
await page.wait_for_load_state('networkidle')
# Get result count
results_text = await page.locator('#result-stats').text_content()
scrape_data({
'query': query,
'results_text': results_text
})
# Go back for next search
await page.goto('https://www.google.com')
Configuration Parameters¶
Parameterize scraping behavior:
Imported Data:
{
"target_url": "https://example.com/products",
"max_pages": 5,
"wait_time": 2,
"capture_screenshots": true,
"filters": {
"category": "electronics",
"min_price": 100
}
}
Script:
async def main(page):
if not imported_data:
config = {
'target_url': 'https://example.com',
'max_pages': 1,
'wait_time': 1,
'capture_screenshots': False
}
else:
config = imported_data
debug_log(f"Config: {config}")
await page.goto(config['target_url'])
for page_num in range(config['max_pages']):
debug_log(f"Processing page {page_num + 1}")
if config.get('capture_screenshots'):
await capture_screenshot(f"Page {page_num + 1}")
# ... scrape data
await asyncio.sleep(config['wait_time'])
Product IDs¶
Scrape specific products:
Imported Data:
Script:
async def main(page):
if not imported_data:
debug_log("No product IDs provided")
return
base_url = f"https://{imported_data['store']}.com/product/"
for product_id in imported_data['product_ids']:
url = f"{base_url}{product_id}"
debug_log(f"Scraping product: {product_id}")
await page.goto(url)
await page.wait_for_load_state('networkidle')
# Get product details
name = await page.locator('.product-name').text_content()
price = await page.locator('.product-price').text_content()
product_data = {
'id': product_id,
'name': name,
'price': price
}
# Get reviews if requested
if imported_data.get('include_reviews'):
reviews = await page.locator('.review').all()
product_data['review_count'] = len(reviews)
scrape_data(product_data)
Data Persistence¶
Session-Based¶
Imported data persists during your editor session:
- ✅ Stays loaded when you run the script multiple times
- ✅ Remains even if you edit the script
- ❌ Clears when you close the browser tab
- ❌ Clears when you refresh the page
Clearing Data¶
To clear imported data:
- Click "Import Data" button
- Delete all text in the textarea
- Click "Import"
- The button will return to gray (no checkmark)
Updating Data¶
To update imported data:
- Click "Import Data" button
- The current data will be shown
- Edit the JSON
- Click "Import"
Best Practices¶
1. Always Validate¶
Check if data exists and has expected structure:
async def main(page):
# Check if data exists
if not imported_data:
debug_log("ERROR: No data imported")
return
# Validate required fields
if 'urls' not in imported_data:
debug_log("ERROR: Missing 'urls' field")
return
if not isinstance(imported_data['urls'], list):
debug_log("ERROR: 'urls' must be a list")
return
# Proceed with scraping
for url in imported_data['urls']:
await page.goto(url)
# ...
2. Provide Defaults¶
Make scripts work with or without imported data:
async def main(page):
# Use imported data or defaults
urls = imported_data.get('urls', ['https://example.com']) if imported_data else ['https://example.com']
max_items = imported_data.get('max_items', 10) if imported_data else 10
debug_log(f"Processing {len(urls)} URLs, max {max_items} items each")
# ...
3. Log What You're Using¶
Help with debugging by logging imported data:
async def main(page):
if imported_data:
debug_log(f"Imported data: {imported_data}")
else:
debug_log("No data imported, using defaults")
4. Use Type Checking¶
Handle different data types safely:
async def main(page):
if not imported_data:
return
# Handle list of URLs
if isinstance(imported_data, list):
urls = imported_data
# Handle dict with URLs array
elif isinstance(imported_data, dict) and 'urls' in imported_data:
urls = imported_data['urls']
# Handle single URL string
elif isinstance(imported_data, str):
urls = [imported_data]
else:
debug_log("ERROR: Unexpected data format")
return
for url in urls:
await page.goto(url)
# ...
JSON Format¶
Valid JSON Examples¶
Object:
Array:
Nested:
{
"users": [
{"name": "Alice", "age": 30},
{"name": "Bob", "age": 25}
],
"settings": {
"enabled": true,
"count": 10
}
}
Primitive:
Common JSON Errors¶
Trailing commas:
Single quotes:
Unquoted keys:
Comments:
Troubleshooting¶
"Invalid JSON" Error¶
Problem: The modal shows "Invalid JSON" when you try to import.
Solution: - Check for trailing commas - Use double quotes, not single quotes - Validate your JSON at jsonlint.com
Data Not Available in Script¶
Problem: imported_data is None even though you imported data.
Solution: - Check that the Import Data button shows a green checkmark (✓) - Try importing the data again - Check browser console for errors
Script Works in Editor but Fails in Execution History¶
Problem: Script works when you run it, but viewing past executions shows different behavior.
Explanation: Imported data is session-based and not stored with execution history. If you want to track what data was used, include it in scrape_data():
async def main(page):
if imported_data:
# Store imported data in execution record
scrape_data({
'imported_data': imported_data,
'results': [] # Your actual results
})