Integration
scraper
Web scraping and data extraction
scraper
Web scraping and data extraction with browser automation, JavaScript rendering, and intelligent parsing for structured data collection.
Overview
The scraper primitive provides powerful web scraping capabilities including headless browser automation, CSS/XPath selectors, and automatic pagination for extracting data from websites.
Quick Example
import { scraper } from 'sdk.do'
// Simple scraping
const data = await scraper.scrape({
url: 'https://example.com/products',
selectors: {
title: 'h1.product-title',
price: '.price',
description: '.description',
},
})
// Scrape list of items
const products = await scraper.scrapeList({
url: 'https://example.com/products',
itemSelector: '.product',
fields: {
name: 'h2',
price: '.price',
image: 'img@src',
},
pagination: {
selector: 'a.next-page',
maxPages: 10,
},
})
// Browser automation
const result = await scraper.withBrowser(async (page) => {
await page.goto('https://example.com')
await page.click('.load-more')
await page.waitForSelector('.products-loaded')
return await page.evaluate(() => {
return Array.from(document.querySelectorAll('.product')).map((el) => el.textContent)
})
})Core Capabilities
- Headless Browser - Full Chrome/Firefox automation
- CSS/XPath Selectors - Flexible element selection
- JavaScript Rendering - Scrape dynamic content
- Pagination - Automatic multi-page scraping
- Rate Limiting - Respect robots.txt and rate limits
Access Methods
SDK
TypeScript/JavaScript library for scraping
await scraper.scrape({ url: 'https://example.com', selectors: { title: 'h1' } })CLI
Command-line tool for web scraping
do scraper scrape https://example.com --selector "title:h1"API
REST/RPC endpoints for scraping operations
curl -X POST https://api.do/v1/scraper/scrape -d '{"url":"https://example.com"}'MCP
Model Context Protocol for AI-driven scraping
Scrape https://example.com and extract the title from h1 element