PDF reader MCP

What is MCP PDF Reader Enhanced?

MCP PDF Reader Enhanced is a comprehensive Model Context Protocol (MCP) server designed to provide advanced functionalities for interacting with PDF files. It allows users to extract text, search for specific content, and retrieve metadata from PDF documents.

How to use MCP PDF Reader Enhanced?

To use MCP PDF Reader Enhanced, you first need to install it using npm install. Once installed, you can integrate it with your Cursor settings by adding the provided JSON configuration. The server exposes three main tools:

read-pdf: For extracting text from PDF files with options for page ranges, metadata inclusion, and text cleaning.
- Example: {"file": "/path/to/document.pdf", "clean_text": true}
search-pdf: For searching specific text within PDF documents with options for case sensitivity and whole-word matching.
- Example: {"file": "/path/to/document.pdf", "query": "important term", "case_sensitive": true}
pdf-metadata: For extracting comprehensive metadata from PDF files without text extraction.
- Example: {"file": "/path/to/document.pdf"}

Key Features of MCP PDF Reader Enhanced

Core Functionality

Text Extraction: Extract text content from PDF files with customizable options.
Text Search: Search for specific text within PDFs with advanced options.
Metadata Extraction: Retrieve comprehensive PDF metadata.
Page-specific Processing: Extract content from specific page ranges.
Text Cleaning: Normalize and clean extracted text.
File Size Limits: Protection against overly large files (50MB limit).
Async Processing: Non-blocking file operations.

Advanced Features

Multiple Tools: 3 specialized tools for different PDF operations.
Smart Search: Case-sensitive, whole-word, and regex search options.
Rich Metadata: Extract author, title, creation date, keywords, and more.
Performance: Efficient processing with size limits and error handling.
Security: File validation and path sanitization.

Use Cases of MCP PDF Reader Enhanced

Information Retrieval: Quickly extract specific information from large PDF documents.
Document Analysis: Analyze PDF content for research, legal, or business purposes.
Automated Workflows: Integrate PDF text processing into automated data pipelines.
Content Management: Extract and organize content from PDF archives.

FAQ about MCP PDF Reader Enhanced

Q: What are the system requirements?
- A: Node.js 18.0.0 or higher, sufficient memory for PDF file size + processing overhead, and temporary storage space for file operations.
Q: Can it handle scanned PDFs?
- A: Not yet, but OCR support is a planned future enhancement.
Q: Is there a limit to the file size?
- A: Yes, there is a 50MB file size limit to prevent overly large files from being processed.
Q: Can I extract images or tables?
- A: Image and table extraction are planned future enhancements.
Q: How can I contribute?
- A: Contributions are welcome in areas like additional PDF processing library integration, performance optimizations, new extraction features, better error handling, and test coverage.

MCP PDF Reader Enhanced

A comprehensive Model Context Protocol (MCP) server that provides advanced PDF text extraction, search, and analysis functionality.

Features

Core Functionality

✅ Text Extraction: Extract text content from PDF files with customizable options
✅ Text Search: Search for specific text within PDFs with advanced options
✅ Metadata Extraction: Retrieve comprehensive PDF metadata
✅ Page-specific Processing: Extract content from specific page ranges
✅ Text Cleaning: Normalize and clean extracted text
✅ File Size Limits: Protection against overly large files (50MB limit)
✅ Async Processing: Non-blocking file operations

Advanced Features

🔄 Multiple Tools: 3 specialized tools for different PDF operations
🔍 Smart Search: Case-sensitive, whole-word, and regex search options
📊 Rich Metadata: Extract author, title, creation date, keywords, and more
⚡ Performance: Efficient processing with size limits and error handling
🛡️ Security: File validation and path sanitization

Installation

npm install

Tools Available

1. `read-pdf` - Enhanced PDF Reading

Extract text from PDF files with customizable options.

Parameters:

file (string, required): Path to the PDF file
pages (string, optional): Page range (e.g., '1-5', '1,3,5', 'all'). Default: 'all'
include_metadata (boolean, optional): Include PDF metadata. Default: true
clean_text (boolean, optional): Clean and normalize text. Default: false

Example Usage:

// Basic extraction
{ "file": "/path/to/document.pdf" }

// Extract with clean text and no metadata
{ 
  "file": "/path/to/document.pdf", 
  "clean_text": true, 
  "include_metadata": false 
}

2. `search-pdf` - Search Within PDFs

Search for specific text within PDF documents.

Parameters:

file (string, required): Path to the PDF file
query (string, required): Text to search for
case_sensitive (boolean, optional): Case sensitive search. Default: false
whole_word (boolean, optional): Match whole words only. Default: false

Example Usage:

// Case-insensitive search
{ "file": "/path/to/document.pdf", "query": "important term" }

// Whole word, case-sensitive search
{ 
  "file": "/path/to/document.pdf", 
  "query": "API", 
  "case_sensitive": true, 
  "whole_word": true 
}

3. `pdf-metadata` - Extract Metadata Only

Get comprehensive metadata from PDF files without extracting text.

Parameters:

file (string, required): Path to the PDF file

Returns:

Filename, file size, page count
Author, title, subject, creator, producer
Creation/modification dates, keywords
Encryption status, PDF version

Configuration

Cursor Integration

Add to your Cursor settings:

{
  "mcp": {
    "servers": {
      "mcp-gp-pdf-reader": {
        "command": "node",
        "args": ["/absolute/path/to/mcp_gp_pdf_reader/index.js"]
      }
    }
  }
}

Future Enhancements

Planned Features

🔮 OCR Support: Extract text from scanned/image-based PDFs
🔮 Image Extraction: Extract images from PDF documents
🔮 Table Detection: Identify and extract tabular data
🔮 Form Data: Extract form fields and values
🔮 Password Support: Handle password-protected PDFs
🔮 Batch Processing: Process multiple PDFs simultaneously
🔮 Caching: Cache parsed results for better performance
🔮 Page-by-Page: True page-specific text extraction

Technical Improvements

🔧 Streaming: Handle very large PDFs with streaming
🔧 Progress Tracking: Progress indicators for long operations
🔧 Resource Management: Better memory usage optimization
🔧 Configuration API: Runtime configuration updates

Usage Examples

Basic Text Extraction

# Via MCP client
"Extract all text from /documents/report.pdf"

Searching PDFs

# Via MCP client  
"Search for 'quarterly results' in /documents/financial-report.pdf"

Getting Metadata

# Via MCP client
"Get metadata from /documents/contract.pdf"

Development

Requirements

Node.js 18.0.0 or higher
Memory: Sufficient for PDF file size + processing overhead
Storage: Temporary space for file operations

Contributing

This MCP server is designed to be extensible. Key areas for contribution:

Additional PDF processing libraries integration
Performance optimizations
New extraction features
Better error handling
Test coverage

License

MIT License

PDF reader MCP Overview

What is MCP PDF Reader Enhanced?

How to use MCP PDF Reader Enhanced?

Key Features of MCP PDF Reader Enhanced

Core Functionality

Advanced Features

Use Cases of MCP PDF Reader Enhanced

FAQ about MCP PDF Reader Enhanced

PDF reader MCP's README

MCP PDF Reader Enhanced

Features

Core Functionality

Advanced Features

Installation

Tools Available

1. read-pdf - Enhanced PDF Reading

2. search-pdf - Search Within PDFs

3. pdf-metadata - Extract Metadata Only

Configuration

Cursor Integration

Future Enhancements

Planned Features

Technical Improvements

Usage Examples

Basic Text Extraction

Searching PDFs

Getting Metadata

Development

Requirements

Contributing

License

PDF reader MCP Reviews

Login Required

Related MCP Servers

Vizro

AntV Chart

Data Exploration

Wren Engine

QuickChart

BigQuery

Mcp Vegalite Server

Google Analytics

Mcp Tinybird

Actions

PDF reader MCP's Information

1. `read-pdf` - Enhanced PDF Reading

2. `search-pdf` - Search Within PDFs

3. `pdf-metadata` - Extract Metadata Only