by cyberchitta
scrapling-fetch-mcp helps AI assistants access text content from bot-protected websites by fetching HTML/markdown from sites with anti-automation measures.
scrapling-fetch-mcp is an MCP (Multi-Client Protocol) server designed to enable AI assistants, such as Claude, to access text content from websites that employ bot detection and anti-automation measures. It acts as a bridge between the AI and websites that are typically inaccessible to automated scraping tools, allowing the AI to retrieve documentation and reference materials.
To use scrapling-fetch-mcp, you need Python 3.10+ and the uv
package manager. After installing these prerequisites, you can install scrapling-fetch-mcp using uv tool install scrapling-fetch-mcp
. For integration with Claude, you need to add a specific configuration to your Claude client's MCP server settings, specifying uvx
as the command and scrapling-fetch-mcp
as the argument.
Once set up, AI assistants can invoke the provided tools, s-fetch-page
and s-fetch-pattern
, to interact with websites. For example, to fetch a complete page, the AI would use <mcp:invoke name="s-fetch-page"><mcp:parameter name="url">...</mcp:parameter></mcp:invoke>
. To extract specific content using a regex pattern, it would use <mcp:invoke name="s-fetch-pattern"><mcp:parameter name="url">...</mcp:parameter><mcp:parameter name="search_pattern">...</mcp:parameter></mcp:invoke>
.
s-fetch-page
for retrieving entire web pages with pagination and s-fetch-pattern
for extracting content matching regex patterns with surrounding context.basic
: Fast retrieval for less protected sites.stealth
: Balanced protection for most sites.max-stealth
: Maximum protection for heavily protected sites.s-fetch-page
supports start_index
and max_length
for handling large documents.s-fetch-pattern
provides context_chars
to include surrounding text with matched patterns.Q: What kind of content is scrapling-fetch-mcp designed for? A: It is designed primarily for text content, such as documentation, articles, and reference materials. It is not intended for general-purpose site scraping or data harvesting.
Q: How do I choose the right protection level?
A: Start with basic
mode for faster retrieval. If that fails, escalate to stealth
, and then max-stealth
for heavily protected sites.
Q: Can scrapling-fetch-mcp handle large documents?
A: Yes, s-fetch-page
supports pagination parameters (start_index
and max_length
) to retrieve large documents in parts.
Q: Can it work with sites requiring authentication? A: The project documentation states that it "May not work with sites requiring authentication."
Q: Is it suitable for high-volume scraping? A: No, it is explicitly stated that it is "Not designed for high-volume scraping or data harvesting."
An MCP server that helps AI assistants access text content from websites that implement bot detection, bridging the gap between what you can see in your browser and what the AI can access.
This tool is optimized for low-volume retrieval of documentation and reference materials (text/HTML only) from websites that implement bot detection. It has not been designed or tested for general-purpose site scraping or data harvesting.
Note: This project was developed in collaboration with Claude Sonnet 3.7, using LLM Context.
Requirements:
Install dependencies and the tool:
uv tool install scrapling
scrapling install
uv tool install scrapling-fetch-mcp
Add this configuration to your Claude client's MCP server configuration:
{
"mcpServers": {
"Cyber-Chitta": {
"command": "uvx",
"args": ["scrapling-fetch-mcp"]
}
}
}
This package provides two distinct tools:
Human: Please fetch and summarize the documentation at https://example.com/docs
Claude: I'll help you with that. Let me fetch the documentation.
<mcp:function_calls>
<mcp:invoke name="s-fetch-page">
<mcp:parameter name="url">https://example.com/docs</mcp:parameter>
<mcp:parameter name="mode">basic</mcp:parameter>
</mcp:invoke>
</mcp:function_calls>
Based on the documentation I retrieved, here's a summary...
Human: Please find all mentions of "API keys" on the documentation page.
Claude: I'll search for that specific information.
<mcp:function_calls>
<mcp:invoke name="s-fetch-pattern">
<mcp:parameter name="url">https://example.com/docs</mcp:parameter>
<mcp:parameter name="mode">basic</mcp:parameter>
<mcp:parameter name="search_pattern">API\s+keys?</mcp:parameter>
<mcp:parameter name="context_chars">150</mcp:parameter>
</mcp:invoke>
</mcp:function_calls>
I found several mentions of API keys in the documentation:
...
Protection Levels:
basic
: Fast retrieval (1-2 seconds) but lower success with heavily protected sitesstealth
: Balanced protection (3-8 seconds) that works with most sitesmax-stealth
: Maximum protection (10+ seconds) for heavily protected sitesContent Targeting Options:
start_index
and max_length
)search_pattern
and context_chars
)
s-fetch-page
basic
mode and only escalate to higher protection levels if neededs-fetch-page
s-fetch-pattern
when looking for specific information on large pagesApache 2
Reviews feature coming soon
Stay tuned for community discussions and feedback