Structured Extraction

Extract structured data from any website

Define a JSON schema, send a URL. Get clean, typed data back — powered by your own LLM key. No more brittle CSS selectors.

Schema → JSON

Send a schema, get structured data

Define what you want. Purify + your LLM extracts it from any page.

Request
curl -X POST https://purify.verifly.pro/api/v1/extract \  -H "Authorization: Bearer YOUR_API_KEY" \  -H "Content-Type: application/json" \  -d '{    "url": "https://news.ycombinator.com",    "schema": {      "type": "array",      "items": {        "type": "object",        "properties": {          "title": { "type": "string" },          "url": { "type": "string" },          "points": { "type": "number" },          "comments": { "type": "number" }        }      }    },    "llm_api_key": "sk-your-openai-key"  }'
Response
{  "success": true,  "data": [    {      "title": "Show HN: Purify – Web to Markdown API",      "url": "https://github.com/Easonliuliang/purify",      "points": 342,      "comments": 89    },    {      "title": "Why LLMs need clean input data",      "url": "https://example.com/llm-data",      "points": 256,      "comments": 67    }  ],  "processing_time_ms": 1240}
Use cases

Built for AI workflows

From RAG pipelines to autonomous agents — structured extraction powers your data layer.

RAG Pipelines

Extract structured content from documentation, articles, and knowledge bases for your retrieval-augmented generation system.

AI Agents

Give your agents the ability to understand and extract data from any website, not just APIs with JSON endpoints.

Data Enrichment

Augment your CRM, product database, or research corpus with structured web data at scale.

Features

Extraction, done right

Everything you need to turn unstructured web pages into structured data.

Schema-driven Extraction

Define your JSON schema, Purify + your LLM extracts exactly the fields you need.

Bring Your Own Key (BYOK)

Use your own OpenAI, Anthropic, or any LLM API key. We never store or proxy your key.

Batch Processing

Submit multiple URLs in one request. Get structured JSON arrays back.

Consistent Output

Same schema, same structure every time. No more parsing brittle HTML selectors.

RAG Pipeline Ready

Feed structured data directly into your vector store or knowledge graph.

AI Agent Integration

Built-in MCP server lets your agents extract structured data in a single tool call.

Bring Your Own Key (BYOK)

Structured extraction requires an LLM to interpret the schema. Pass your own OpenAI, Anthropic, or any compatible API key in the request. We never store, log, or proxy your key — it goes directly to your LLM provider.

Start extracting structured data

Free tier includes 1,000 extraction requests/month. No credit card required.