scrapling-fetch

scrapling-fetch Overview

What is scrapling-fetch-mcp?

scrapling-fetch-mcp is an MCP (Multi-Client Protocol) server designed to enable AI assistants, such as Claude, to access text content from websites that employ bot detection and anti-automation measures. It acts as a bridge between the AI and websites that are typically inaccessible to automated scraping tools, allowing the AI to retrieve documentation and reference materials.

How to use scrapling-fetch-mcp?

To use scrapling-fetch-mcp, you need Python 3.10+ and the uv package manager. After installing these prerequisites, you can install scrapling-fetch-mcp using uv tool install scrapling-fetch-mcp. For integration with Claude, you need to add a specific configuration to your Claude client's MCP server settings, specifying uvx as the command and scrapling-fetch-mcp as the argument.

Once set up, AI assistants can invoke the provided tools, s-fetch-page and s-fetch-pattern, to interact with websites. For example, to fetch a complete page, the AI would use <mcp:invoke name="s-fetch-page"><mcp:parameter name="url">...</mcp:parameter></mcp:invoke>. To extract specific content using a regex pattern, it would use <mcp:invoke name="s-fetch-pattern"><mcp:parameter name="url">...</mcp:parameter><mcp:parameter name="search_pattern">...</mcp:parameter></mcp:invoke>.

Key Features of scrapling-fetch-mcp

Bot-Protected Website Access: Designed specifically to bypass anti-automation measures on websites.
Two Core Tools: Provides s-fetch-page for retrieving entire web pages with pagination and s-fetch-pattern for extracting content matching regex patterns with surrounding context.
Protection Levels: Offers three modes for fetching content:
- basic: Fast retrieval for less protected sites.
- stealth: Balanced protection for most sites.
- max-stealth: Maximum protection for heavily protected sites.
Content Targeting Options: Allows for retrieving full pages or specific content based on regular expressions.
Pagination Support: s-fetch-page supports start_index and max_length for handling large documents.
Contextual Extraction: s-fetch-pattern provides context_chars to include surrounding text with matched patterns.

Use Cases of scrapling-fetch-mcp

AI Assistant Research: Enables AI assistants to access and summarize documentation, articles, and reference materials from websites that typically block automated access.
Information Retrieval: Useful for extracting specific pieces of information, such as API keys or definitions, from complex web pages.
Content Summarization: Allows AI to fetch entire web pages for summarization, even from bot-protected sources.
Educational Purposes: Can be used by AI to gather information for learning and answering user queries from a wider range of online sources.

FAQ about scrapling-fetch-mcp

Q: What kind of content is scrapling-fetch-mcp designed for? A: It is designed primarily for text content, such as documentation, articles, and reference materials. It is not intended for general-purpose site scraping or data harvesting.

Q: How do I choose the right protection level? A: Start with basic mode for faster retrieval. If that fails, escalate to stealth, and then max-stealth for heavily protected sites.

Q: Can scrapling-fetch-mcp handle large documents? A: Yes, s-fetch-page supports pagination parameters (start_index and max_length) to retrieve large documents in parts.

Q: Can it work with sites requiring authentication? A: The project documentation states that it "May not work with sites requiring authentication."

Q: Is it suitable for high-volume scraping? A: No, it is explicitly stated that it is "Not designed for high-volume scraping or data harvesting."

scrapling-fetch's README

Scrapling Fetch MCP

An MCP server that helps AI assistants access text content from websites that implement bot detection, bridging the gap between what you can see in your browser and what the AI can access.

Intended Use

This tool is optimized for low-volume retrieval of documentation and reference materials (text/HTML only) from websites that implement bot detection. It has not been designed or tested for general-purpose site scraping or data harvesting.

Note: This project was developed in collaboration with Claude Sonnet 3.7, using LLM Context.

Installation

Requirements:
- Python 3.10+
- uv package manager
Install dependencies and the tool:

uv tool install scrapling
scrapling install
uv tool install scrapling-fetch-mcp

Setup with Claude

Add this configuration to your Claude client's MCP server configuration:

{
  "mcpServers": {
    "Cyber-Chitta": {
      "command": "uvx",
      "args": ["scrapling-fetch-mcp"]
    }
  }
}

Available Tools

This package provides two distinct tools:

s-fetch-page: Retrieves complete web pages with pagination support
s-fetch-pattern: Extracts content matching regex patterns with surrounding context

Example Usage

Fetching a Complete Page

Human: Please fetch and summarize the documentation at https://example.com/docs

Claude: I'll help you with that. Let me fetch the documentation.

<mcp:function_calls>
<mcp:invoke name="s-fetch-page">
<mcp:parameter name="url">https://example.com/docs</mcp:parameter>
<mcp:parameter name="mode">basic</mcp:parameter>
</mcp:invoke>
</mcp:function_calls>

Based on the documentation I retrieved, here's a summary...

Extracting Specific Content with Pattern Matching

Human: Please find all mentions of "API keys" on the documentation page.

Claude: I'll search for that specific information.

<mcp:function_calls>
<mcp:invoke name="s-fetch-pattern">
<mcp:parameter name="url">https://example.com/docs</mcp:parameter>
<mcp:parameter name="mode">basic</mcp:parameter>
<mcp:parameter name="search_pattern">API\s+keys?</mcp:parameter>
<mcp:parameter name="context_chars">150</mcp:parameter>
</mcp:invoke>
</mcp:function_calls>

I found several mentions of API keys in the documentation:
...

Functionality Options

Protection Levels:
- basic: Fast retrieval (1-2 seconds) but lower success with heavily protected sites
- stealth: Balanced protection (3-8 seconds) that works with most sites
- max-stealth: Maximum protection (10+ seconds) for heavily protected sites
Content Targeting Options:
- s-fetch-page: Retrieve entire pages with pagination support (using start_index and max_length)
- s-fetch-pattern: Extract specific content using regular expressions (with search_pattern and context_chars)
  - Results include position information for follow-up queries with s-fetch-page

Tips for Best Results

Start with basic mode and only escalate to higher protection levels if needed
For large documents, use the pagination parameters with s-fetch-page
Use s-fetch-pattern when looking for specific information on large pages
The AI will automatically adjust its approach based on the site's protection level

Limitations

Designed only for text content: Specifically for documentation, articles, and reference materials
Not designed for high-volume scraping or data harvesting
May not work with sites requiring authentication
Performance varies by site complexity

License

Apache 2

scrapling-fetch Reviews

Reviews feature coming soon

Stay tuned for community discussions and feedback