by gpetraroli
An MCP server for advanced PDF text extraction, search, and analysis.
MCP PDF Reader Enhanced is a comprehensive Model Context Protocol (MCP) server designed to provide advanced functionalities for interacting with PDF files. It allows users to extract text, search for specific content, and retrieve metadata from PDF documents.
To use MCP PDF Reader Enhanced, you first need to install it using npm install
. Once installed, you can integrate it with your Cursor settings by adding the provided JSON configuration. The server exposes three main tools:
read-pdf
: For extracting text from PDF files with options for page ranges, metadata inclusion, and text cleaning.
{"file": "/path/to/document.pdf", "clean_text": true}
search-pdf
: For searching specific text within PDF documents with options for case sensitivity and whole-word matching.
{"file": "/path/to/document.pdf", "query": "important term", "case_sensitive": true}
pdf-metadata
: For extracting comprehensive metadata from PDF files without text extraction.
{"file": "/path/to/document.pdf"}
A comprehensive Model Context Protocol (MCP) server that provides advanced PDF text extraction, search, and analysis functionality.
npm install
read-pdf
- Enhanced PDF ReadingExtract text from PDF files with customizable options.
Parameters:
file
(string, required): Path to the PDF filepages
(string, optional): Page range (e.g., '1-5', '1,3,5', 'all'). Default: 'all'include_metadata
(boolean, optional): Include PDF metadata. Default: trueclean_text
(boolean, optional): Clean and normalize text. Default: falseExample Usage:
// Basic extraction
{ "file": "/path/to/document.pdf" }
// Extract with clean text and no metadata
{
"file": "/path/to/document.pdf",
"clean_text": true,
"include_metadata": false
}
search-pdf
- Search Within PDFsSearch for specific text within PDF documents.
Parameters:
file
(string, required): Path to the PDF filequery
(string, required): Text to search forcase_sensitive
(boolean, optional): Case sensitive search. Default: falsewhole_word
(boolean, optional): Match whole words only. Default: falseExample Usage:
// Case-insensitive search
{ "file": "/path/to/document.pdf", "query": "important term" }
// Whole word, case-sensitive search
{
"file": "/path/to/document.pdf",
"query": "API",
"case_sensitive": true,
"whole_word": true
}
pdf-metadata
- Extract Metadata OnlyGet comprehensive metadata from PDF files without extracting text.
Parameters:
file
(string, required): Path to the PDF fileReturns:
Add to your Cursor settings:
{
"mcp": {
"servers": {
"mcp-gp-pdf-reader": {
"command": "node",
"args": ["/absolute/path/to/mcp_gp_pdf_reader/index.js"]
}
}
}
}
# Via MCP client
"Extract all text from /documents/report.pdf"
# Via MCP client
"Search for 'quarterly results' in /documents/financial-report.pdf"
# Via MCP client
"Get metadata from /documents/contract.pdf"
This MCP server is designed to be extensible. Key areas for contribution:
MIT License
Reviews feature coming soon
Stay tuned for community discussions and feedback