Open Source · Apache 2.0

Stop wasting 99% of your AI tokens on HTML junk

Purify turns any web page into pure Markdown before it hits your LLM. Spend $29/mo, save $5,000+ in AI token costs.

Without Purify
~$15,000
50k pages × $0.30 token cost
170×
ROI on Pro plan
With Purify Pro
~$129
$29 API + $100 token cost

Works with your AI stack

ClaudeGPT-4GeminiLLaMAMistralLangChainCrewAIAutoGPTOpenClawPerplexityCursorVercel AI SDKDifyClaudeGPT-4GeminiLLaMAMistralLangChainCrewAIAutoGPTOpenClawPerplexityCursorVercel AI SDKDify
Real-world benchmarks

Dramatic token savings — verified

Purify strips navigation, ads, and boilerplate. Your LLM only sees what matters.

Website
Raw tokens
After Purify
Savings
小红书 (Xiaohongshu)
158,742
353
-99.8%
sspai.com (少数派)
32,895
187
-99.4%
GitHub Repository
99,181
1,370
-98.6%
New York Times
103,744
2,130
-98.0%
Anthropic API Docs
129,066
4,837
-96.3%
BBC News Homepage
97,540
6,969
-92.9%
arXiv Paper
26,684
3,129
-88.3%
Wikipedia (LLM)
245,276
76,325
-68.9%

Measured using tiktoken (GPT-4 tokenizer). Raw = unmodified HTML. After Purify = extracted Markdown.

Quick start

Up and running in 60 seconds

Three steps from zero to clean Markdown.

01

Install Purify

Download a single binary. No Node.js, no Python, no Docker. Just one file.

# macOS / Linuxcurl -sSL https://purify.verifly.pro/install.sh | sh # Or download directly from GitHubwget https://github.com/Easonliuliang/purify/releases/latest/download/purify
02

Send a URL

POST any URL to the API. Works with dynamic JavaScript-heavy sites too.

curl -X POST https://purify.verifly.pro/api/v1/scrape \  -H "Authorization: Bearer YOUR_API_KEY" \  -H "Content-Type: application/json" \  -d '{"url": "https://github.com/Easonliuliang/purify"}'
03

Get clean Markdown

Receive structured, token-efficient Markdown. Ready for your LLM or AI agent.

{  "success": true,  "markdown": "# Purify\n\nTurn any web page into clean Markdown...\n",  "tokens_saved": "98.7%",  "processing_time_ms": 420}
MCP Native

Connect to any AI agent in seconds

Purify ships a built-in MCP server. Drop one config file and your AI assistant can scrape the web.

MCP Config

Claude / Cursor

Add Purify as an MCP server in Claude Desktop or Cursor. Your AI assistant can scrape any web page on demand.

// ~/Library/Application Support/Claude/claude_desktop_config.json{  "mcpServers": {    "purify": {      "command": "/path/to/purify-mcp",      "env": {        "PURIFY_API_URL": "https://purify.verifly.pro",        "PURIFY_API_KEY": "YOUR_API_KEY"      }    }  }}
Why Purify

Everything you need, nothing you don't

A single binary that does web scraping extremely well.

99% Token Savings

Strip HTML junk before it hits your LLM. Verified savings across real websites.

Single Binary

One file, zero dependencies. No Docker, Node.js, or Python required.

Built-in MCP Server

Native Model Context Protocol support. Connect Claude, GPT, and other AI agents directly.

Open Source

Apache 2.0 licensed. Self-host with no usage limits, or use our managed cloud API.

JavaScript Rendering

Headless browser renders dynamic SPA pages before extracting clean content.

Fast Response

Optimized pipeline delivers clean Markdown in under 500ms for most pages.

Comparison

Built different

No cloud lock-in. No bloat. A single binary that does one thing extremely well.

Feature
Purify
Firecrawl
Jina Reader
Deployment
Single binary
Cloud only
Cloud only
License
Apache 2.0
AGPL / Proprietary
Apache 2.0
Self-hostable
Built-in MCP Server
Zero dependencies
Token savings
Up to 99.8%
~70–80%
~70–80%
Open source

Frequently asked questions

Purify is a web scraping API that converts any web page into clean, token-efficient Markdown — optimized for LLMs and AI agents. It strips out navigation, ads, scripts, and boilerplate, saving you up to 99% on AI token costs.

Purify is a single binary with zero dependencies. It's fully open-source (Apache 2.0), self-hostable, and ships with a built-in MCP server. No cloud lock-in, no bloat.

Absolutely. Download the binary and run it anywhere — no Docker, no Node.js, no Python required. Self-hosted instances have no usage limits.

Model Context Protocol (MCP) is an open standard that lets AI agents interact with external tools. Purify's built-in MCP server lets Claude, GPT, and other agents scrape URLs directly.

If you process 50,000 pages/month, raw HTML would cost ~$15,000 in GPT-4 tokens. With Purify, you'd pay ~$129 total ($29 API + ~$100 tokens). That's a 170× ROI.

Yes. Purify uses a headless browser to fully render dynamic pages before extracting content.

Ready to purify the web?

Join developers building smarter AI agents — start free, no credit card needed.