Integration
extract
Data extraction and transformation
extract
Intelligent data extraction from various sources with pattern matching, schema mapping, and automatic type conversion for structured data processing.
Overview
The extract primitive provides powerful data extraction capabilities including parsing unstructured text, extracting entities from documents, and transforming data between formats with automatic schema inference.
Quick Example
import { extract } from 'sdk.do'
// Extract structured data from text
const data = await extract.fromText({
text: 'John Smith works at Acme Corp as a Software Engineer',
schema: {
name: 'string',
company: 'string',
title: 'string',
},
})
// Extract from documents
const invoices = await extract.fromPDF({
file: './invoice.pdf',
type: 'invoice',
fields: {
invoiceNumber: 'string',
amount: 'number',
date: 'date',
items: 'array',
},
})
// Extract with AI
const entities = await extract.withAI({
content: document,
extract: ['people', 'organizations', 'locations', 'dates'],
model: 'gpt-5',
})Core Capabilities
- Pattern Matching - Regex and semantic pattern extraction
- Schema Mapping - Transform data between formats
- Entity Extraction - Identify people, places, organizations
- Format Conversion - Parse JSON, XML, CSV, PDF, HTML
- AI-Powered - LLM-based extraction for complex data
Access Methods
SDK
TypeScript/JavaScript library for data extraction
await extract.fromText({ text: 'John Smith at Acme', schema: { name: 'string', company: 'string' } })CLI
Command-line tool for extraction operations
do extract pdf invoice.pdf --type invoice --output invoice.jsonAPI
REST/RPC endpoints for extraction services
curl -X POST https://api.do/v1/extract -d '{"text":"John at Acme","schema":{"name":"string"}}'MCP
Model Context Protocol for AI-driven extraction
Extract name and company from "John Smith works at Acme Corp"