extract

Intelligent data extraction from various sources with pattern matching, schema mapping, and automatic type conversion for structured data processing.

Overview

The extract primitive provides powerful data extraction capabilities including parsing unstructured text, extracting entities from documents, and transforming data between formats with automatic schema inference.

Quick Example

import { extract } from 'sdk.do'

// Extract structured data from text
const data = await extract.fromText({
  text: 'John Smith works at Acme Corp as a Software Engineer',
  schema: {
    name: 'string',
    company: 'string',
    title: 'string',
  },
})

// Extract from documents
const invoices = await extract.fromPDF({
  file: './invoice.pdf',
  type: 'invoice',
  fields: {
    invoiceNumber: 'string',
    amount: 'number',
    date: 'date',
    items: 'array',
  },
})

// Extract with AI
const entities = await extract.withAI({
  content: document,
  extract: ['people', 'organizations', 'locations', 'dates'],
  model: 'gpt-5',
})

Core Capabilities

Pattern Matching - Regex and semantic pattern extraction
Schema Mapping - Transform data between formats
Entity Extraction - Identify people, places, organizations
Format Conversion - Parse JSON, XML, CSV, PDF, HTML
AI-Powered - LLM-based extraction for complex data

Access Methods

SDK

TypeScript/JavaScript library for data extraction

await extract.fromText({ text: 'John Smith at Acme', schema: { name: 'string', company: 'string' } })

→ SDK Documentation

CLI

Command-line tool for extraction operations

do extract pdf invoice.pdf --type invoice --output invoice.json

→ CLI Documentation

API

REST/RPC endpoints for extraction services

curl -X POST https://api.do/v1/extract -d '{"text":"John at Acme","schema":{"name":"string"}}'

→ API Documentation

MCP

Model Context Protocol for AI-driven extraction

Extract name and company from "John Smith works at Acme Corp"

→ MCP Documentation

scraper - Web scraping
transform - Data transformation
fetch - Data retrieval

extract

On this page