Fabric MCP

What is ms-fabric-mcp?

ms-fabric-mcp (Microsoft Fabric MCP Server) is a comprehensive Python-based server that implements the Model Context Protocol (MCP) to facilitate interaction with Microsoft Fabric APIs. It integrates Large Language Models (LLMs) to enhance PySpark notebook development, testing, and optimization within the Microsoft Fabric environment.

How to use ms-fabric-mcp?

To use ms-fabric-mcp, you need Python 3.12+, Azure credentials, uv, and Azure CLI. Optional Node.js is required for the MCP inspector. After cloning the repository and setting up the virtual environment with uv sync and pip install -r requirements.txt, you can connect to Microsoft Fabric using az login --scope https://api.fabric.microsoft.com/.default. The server can be run via STDIO or HTTP. For STDIO, use uv run --with mcp mcp dev fabric_mcp.py to start the server with the inspector at http://localhost:6274. For HTTP, use uv run python .\fabric_mcp.py --port 8081. VSCode integration is supported by configuring launch.json for either STDIO or HTTP server types.

Key Features of ms-fabric-mcp

Core Fabric Operations

Workspace, lakehouse, warehouse, and table management.
Delta table schemas and metadata retrieval.
SQL query execution and data loading.
Report and semantic model operations.

Advanced PySpark Development

Intelligent notebook creation with 6 specialized templates.
Smart code generation for common PySpark operations.
Comprehensive validation for syntax and best practices.
Fabric-specific optimizations and compatibility checks.
Performance analysis with scoring and optimization recommendations.
Real-time monitoring and execution insights.

LLM Integration

Natural language interface for PySpark development.
Context-aware assistance with conversation memory.
Intelligent code formatting and explanations.
Smart optimization suggestions based on project patterns.

Use Cases of ms-fabric-mcp

Accelerated PySpark Development: Developers can use natural language to generate PySpark code, create notebooks from templates, and receive intelligent assistance.
Optimizing Fabric Workloads: The tool helps identify performance bottlenecks, suggests optimizations, and validates code for Fabric compatibility.
Automated Data Operations: Manage Fabric resources like workspaces, lakehouses, and tables, and execute SQL queries programmatically.
Enhanced Collaboration: By integrating with LLMs, it provides context-aware assistance, making it easier for teams to develop and maintain PySpark notebooks.

FAQ from ms-fabric-mcp

Q: What are common issues and how to troubleshoot them? A: Common issues include authentication problems (ensure az login with correct scope), context issues (use clear_context() to reset session state), and workspace verification (check names and permissions). Validation tools and performance analysis can also help in troubleshooting.

Q: What PySpark templates are available? A: Basic templates include basic, etl, analytics, and ml. Advanced templates are fabric_integration and streaming.

Q: How does ms-fabric-mcp help with performance optimization? A: It provides tools for comprehensive performance analysis, identifies anti-patterns and bottlenecks, suggests specific optimizations, generates optimized code alternatives, and offers before/after comparisons. It also encourages best practices like caching DataFrames, using broadcast for small tables, and partitioning large datasets.

Q: How does the LLM integration work? A: The LLM acts as an intelligent interface. Developers request assistance in their IDE, the LLM analyzes the request using context and reasoning, calls MCP server tools, and then the results flow back through the LLM with intelligent formatting, providing contextual and smart responses.

Microsoft Fabric MCP Server

A comprehensive Python-based MCP (Model Context Protocol) server for interacting with Microsoft Fabric APIs, featuring advanced PySpark notebook development, testing, and optimization capabilities with LLM integration.

🚀 Features

Core Fabric Operations

✅ Workspace, lakehouse, warehouse, and table management
✅ Delta table schemas and metadata retrieval
✅ SQL query execution and data loading
✅ Report and semantic model operations

Advanced PySpark Development

📓 Intelligent notebook creation with 6 specialized templates
🔧 Smart code generation for common PySpark operations
✅ Comprehensive validation with syntax and best practices checking
🎯 Fabric-specific optimizations and compatibility checks
📊 Performance analysis with scoring and optimization recommendations
🚀 Real-time monitoring and execution insights

LLM Integration

🤖 Natural language interface for PySpark development
🧠 Context-aware assistance with conversation memory
🎨 Intelligent code formatting and explanations
📈 Smart optimization suggestions based on project patterns

🏗️ Architecture

graph TB subgraph "Developer Environment" IDE[IDE/VSCode] DEV[Developer] PROJ[Project Files] end subgraph "AI Layer" LLM[Large Language Model<br/>Claude/GPT/etc.] CONTEXT[Conversation Context] REASONING[AI Reasoning Engine] end subgraph "MCP Layer" MCP[MCP Server] TOOLS[PySpark Tools] HELPERS[PySpark Helpers] TEMPLATES[Template Manager] VALIDATORS[Code Validators] GENERATORS[Code Generators] end subgraph "Microsoft Fabric" API[Fabric API] WS[Workspace] LH[Lakehouse] NB[Notebooks] TABLES[Delta Tables] SPARK[Spark Clusters] end subgraph "Operations Flow" CREATE[Create Notebooks] VALIDATE[Validate Code] GENERATE[Generate Code] ANALYZE[Analyze Performance] DEPLOY[Deploy to Fabric] end %% Developer interactions DEV --> IDE IDE --> PROJ %% LLM interactions IDE <--> LLM LLM <--> CONTEXT LLM --> REASONING %% MCP interactions LLM <--> MCP MCP --> TOOLS TOOLS --> HELPERS TOOLS --> TEMPLATES TOOLS --> VALIDATORS TOOLS --> GENERATORS %% Fabric interactions MCP <--> API API --> WS WS --> LH WS --> NB LH --> TABLES NB --> SPARK %% Operation flows TOOLS --> CREATE TOOLS --> VALIDATE TOOLS --> GENERATE TOOLS --> ANALYZE CREATE --> DEPLOY %% Data flow arrows REASONING -.->|"Intelligent Decisions"| TOOLS CONTEXT -.->|"Project Awareness"| VALIDATORS %% Styling classDef devEnv fill:#e1f5fe classDef aiLayer fill:#fff9c4 classDef mcpLayer fill:#f3e5f5 classDef fabricLayer fill:#e8f5e8 classDef operations fill:#fff3e0 class IDE,DEV,PROJ devEnv class LLM,CONTEXT,REASONING aiLayer class MCP,TOOLS,HELPERS,TEMPLATES,VALIDATORS,GENERATORS mcpLayer class API,WS,LH,NB,TABLES,SPARK fabricLayer class CREATE,VALIDATE,GENERATE,ANALYZE,DEPLOY operations

Interaction Flow

Developer requests assistance in IDE
IDE communicates with LLM (Claude/GPT)
LLM analyzes using context and reasoning
LLM calls MCP server tools intelligently
MCP tools interact with Fabric API
Results flow back through LLM with intelligent formatting
Developer receives contextual, smart responses

📋 Requirements

Python 3.12+
Azure credentials for authentication
uv (from astral): Installation instructions
Azure CLI: Installation instructions
Optional: Node.js for MCP inspector: Installation instructions

🔧 Installation

Clone the repository:

git clone https://github.com/your-repo/fabric-mcp.git
cd fabric-mcp

Set up virtual environment:
```
uv sync
```
Install dependencies:
```
pip install -r requirements.txt
```

🚀 Usage

Using STDIO

Connect to Microsoft Fabric

az login --scope https://api.fabric.microsoft.com/.default

Running with MCP Inspector

uv run --with mcp mcp dev fabric_mcp.py

This starts the server with inspector at http://localhost:6274.

VSCode Integration

Add to your launch.json:

{
    "mcp": {
        "servers": {
            "ms-fabric-mcp": {
                "type": "stdio",
                "command": "<FullPathToProjectFolder>\\.venv\\Scripts\\python.exe",
                "args": ["<FullPathToProjectFolder>\\fabric_mcp.py"]
            }
        }
    }
}

Using HTTP

Start the MCP Server

uv run python .\fabric_mcp.py --port 8081

VSCode Integration

Add to your launch.json:

{
    "mcp": {
        "servers": {
            "ms-fabric-mcp": {
                "type": "http",
                "url": "http://<localhost or remote IP>:8081/mcp/",
                "headers": {
                    "Accept": "application/json,text/event-stream",
                }
            }
        }
    }
}

🛠️ Complete Tool Reference

1. Workspace Management

`list_workspaces`

List all available Fabric workspaces.

# Usage in LLM: "List all my Fabric workspaces"

`set_workspace`

Set the current workspace context for the session.

set_workspace(workspace="Analytics-Workspace")

2. Lakehouse Operations

`list_lakehouses`

List all lakehouses in a workspace.

list_lakehouses(workspace="Analytics-Workspace")

`create_lakehouse`

Create a new lakehouse.

create_lakehouse(
    name="Sales-Data-Lake",
    workspace="Analytics-Workspace",
    description="Sales data lakehouse"
)

`set_lakehouse`

Set current lakehouse context.

set_lakehouse(lakehouse="Sales-Data-Lake")

3. Warehouse Operations

`list_warehouses`

List all warehouses in a workspace.

list_warehouses(workspace="Analytics-Workspace")

`create_warehouse`

Create a new warehouse.

create_warehouse(
    name="Sales-DW",
    workspace="Analytics-Workspace", 
    description="Sales data warehouse"
)

`set_warehouse`

Set current warehouse context.

set_warehouse(warehouse="Sales-DW")

4. Table Operations

`list_tables`

List all tables in a lakehouse.

list_tables(workspace="Analytics-Workspace", lakehouse="Sales-Data-Lake")

`get_lakehouse_table_schema`

Get schema for a specific table.

get_lakehouse_table_schema(
    workspace="Analytics-Workspace",
    lakehouse="Sales-Data-Lake",
    table_name="transactions"
)

`get_all_lakehouse_schemas`

Get schemas for all tables in a lakehouse.

get_all_lakehouse_schemas(
    workspace="Analytics-Workspace",
    lakehouse="Sales-Data-Lake"
)

`set_table`

Set current table context.

set_table(table_name="transactions")

5. SQL Operations

`get_sql_endpoint`

Get SQL endpoint for lakehouse or warehouse.

get_sql_endpoint(
    workspace="Analytics-Workspace",
    lakehouse="Sales-Data-Lake",
    type="lakehouse"
)

`run_query`

Execute SQL queries.

run_query(
    workspace="Analytics-Workspace",
    lakehouse="Sales-Data-Lake",
    query="SELECT COUNT(*) FROM transactions",
    type="lakehouse"
)

6. Data Loading

`load_data_from_url`

Load data from URL into tables.

load_data_from_url(
    url="https://example.com/data.csv",
    destination_table="new_data",
    workspace="Analytics-Workspace",
    lakehouse="Sales-Data-Lake"
)

7. Reports & Models

`list_reports`

List all reports in a workspace.

list_reports(workspace="Analytics-Workspace")

`get_report`

Get specific report details.

get_report(workspace="Analytics-Workspace", report_id="report-id")

`list_semantic_models`

List semantic models in workspace.

list_semantic_models(workspace="Analytics-Workspace")

`get_semantic_model`

Get specific semantic model.

get_semantic_model(workspace="Analytics-Workspace", model_id="model-id")

8. Basic Notebook Operations

`list_notebooks`

List all notebooks in a workspace.

list_notebooks(workspace="Analytics-Workspace")

`get_notebook_content`

Retrieve notebook content.

get_notebook_content(
    workspace="Analytics-Workspace",
    notebook_id="notebook-id"
)

`update_notebook_cell`

Update specific notebook cells.

update_notebook_cell(
    workspace="Analytics-Workspace",
    notebook_id="notebook-id",
    cell_index=0,
    cell_content="print('Hello, Fabric!')",
    cell_type="code"
)

9. Advanced PySpark Notebook Creation

`create_pyspark_notebook`

Create notebooks from basic templates.

create_pyspark_notebook(
    workspace="Analytics-Workspace",
    notebook_name="Data-Analysis",
    template_type="analytics"  # Options: basic, etl, analytics, ml
)

`create_fabric_notebook`

Create Fabric-optimized notebooks.

create_fabric_notebook(
    workspace="Analytics-Workspace",
    notebook_name="Fabric-Pipeline",
    template_type="fabric_integration"  # Options: fabric_integration, streaming
)

10. PySpark Code Generation

`generate_pyspark_code`

Generate code for common operations.

generate_pyspark_code(
    operation="read_table",
    source_table="sales.transactions",
    columns="id,amount,date"
)

# Available operations:
# - read_table, write_table, transform, join, aggregate
# - schema_inference, data_quality, performance_optimization

`generate_fabric_code`

Generate Fabric-specific code.

generate_fabric_code(
    operation="read_lakehouse",
    lakehouse_name="Sales-Data-Lake",
    table_name="transactions"
)

# Available operations:
# - read_lakehouse, write_lakehouse, merge_delta, performance_monitor

11. Code Validation & Analysis

`validate_pyspark_code`

Validate PySpark code syntax and best practices.

validate_pyspark_code(code="""
df = spark.table('transactions')
df.show()
""")

`validate_fabric_code`

Validate Fabric compatibility.

validate_fabric_code(code="""
df = spark.table('lakehouse.transactions')
df.write.format('delta').saveAsTable('summary')
""")

`analyze_notebook_performance`

Comprehensive performance analysis.

analyze_notebook_performance(
    workspace="Analytics-Workspace",
    notebook_id="notebook-id"
)

12. Context Management

`clear_context`

Clear current session context.

clear_context()

📊 PySpark Templates

Basic Templates

basic: Fundamental PySpark operations and DataFrame usage
etl: Complete ETL pipeline with data cleaning and Delta Lake
analytics: Advanced analytics with aggregations and window functions
ml: Machine learning pipeline with MLlib and feature engineering

Advanced Templates

fabric_integration: Lakehouse connectivity and Fabric-specific utilities
streaming: Real-time processing with Structured Streaming

🎯 Best Practices

Fabric Optimization

# ✅ Use managed tables
df = spark.table("lakehouse.my_table")

# ✅ Use Delta Lake format
df.write.format("delta").mode("overwrite").saveAsTable("my_table")

# ✅ Leverage notebookutils
import notebookutils as nbu
workspace_id = nbu.runtime.context.workspaceId

Performance Optimization

# ✅ Cache frequently used DataFrames
df.cache()

# ✅ Use broadcast for small tables
from pyspark.sql.functions import broadcast
result = large_df.join(broadcast(small_df), "key")

# ✅ Partition large datasets
df.write.partitionBy("year", "month").saveAsTable("partitioned_table")

Code Quality

# ✅ Define explicit schemas
schema = StructType([
    StructField("id", IntegerType(), True),
    StructField("name", StringType(), True)
])

# ✅ Handle null values
df.filter(col("column").isNotNull())

🔄 Example LLM-Enhanced Workflows

Natural Language Requests

Human: "Create a PySpark notebook that reads sales data, cleans it, and optimizes performance"

LLM Response:
1. Creates Fabric-optimized notebook with ETL template
2. Generates lakehouse reading code
3. Adds data cleaning transformations
4. Includes performance optimization patterns
5. Validates code for best practices

Performance Analysis

Human: "My PySpark notebook is slow. Help me optimize it."

LLM Response:
1. Analyzes notebook performance (scoring 0-100)
2. Identifies anti-patterns and bottlenecks
3. Suggests specific optimizations
4. Generates optimized code alternatives
5. Provides before/after comparisons

🔍 Troubleshooting

Common Issues

Authentication: Ensure az login with correct scope
Context: Use clear_context() to reset session state
Workspace: Verify workspace names and permissions
Templates: Check available template types in documentation

Getting Help

Use validation tools for code issues
Check performance analysis for optimization opportunities
Leverage LLM natural language interface for guidance

📈 Performance Metrics

The analysis tools provide:

Operation counts per notebook cell
Performance issues detection and flagging
Optimization opportunities identification
Scoring system (0-100) for code quality
Fabric compatibility assessment

🤝 Contributing

This project welcomes contributions! Please see our contributing guidelines for details.

📄 License

This project is licensed under the MIT License. See the LICENSE file for details.

🙏 Acknowledgments

Inspired by: https://github.com/Augustab/microsoft_fabric_mcp/tree/main

Ready to supercharge your Microsoft Fabric development with intelligent PySpark assistance! 🚀

Fabric MCP Overview

What is ms-fabric-mcp?

How to use ms-fabric-mcp?

Key Features of ms-fabric-mcp

Core Fabric Operations

Advanced PySpark Development

LLM Integration

Use Cases of ms-fabric-mcp

FAQ from ms-fabric-mcp

Fabric MCP's README

Microsoft Fabric MCP Server

🚀 Features

Core Fabric Operations

Advanced PySpark Development

LLM Integration

🏗️ Architecture

Interaction Flow

📋 Requirements

🔧 Installation

🚀 Usage

Connect to Microsoft Fabric

Running with MCP Inspector

VSCode Integration

Start the MCP Server

VSCode Integration

🛠️ Complete Tool Reference

1. Workspace Management

list_workspaces

set_workspace

2. Lakehouse Operations

list_lakehouses

create_lakehouse

set_lakehouse

3. Warehouse Operations

list_warehouses

create_warehouse

set_warehouse

4. Table Operations

list_tables

get_lakehouse_table_schema

get_all_lakehouse_schemas

set_table

5. SQL Operations

get_sql_endpoint

run_query

6. Data Loading

load_data_from_url

7. Reports & Models

list_reports

get_report

list_semantic_models

get_semantic_model

8. Basic Notebook Operations

list_notebooks

get_notebook_content

update_notebook_cell

9. Advanced PySpark Notebook Creation

create_pyspark_notebook

create_fabric_notebook

10. PySpark Code Generation

generate_pyspark_code

generate_fabric_code

11. Code Validation & Analysis

validate_pyspark_code

validate_fabric_code

analyze_notebook_performance

12. Context Management

clear_context

📊 PySpark Templates

Basic Templates

Advanced Templates

🎯 Best Practices

Fabric Optimization

Performance Optimization

Code Quality

🔄 Example LLM-Enhanced Workflows

Natural Language Requests

Performance Analysis

🔍 Troubleshooting

`list_workspaces`

`set_workspace`

`list_lakehouses`

`create_lakehouse`

`set_lakehouse`

`list_warehouses`

`create_warehouse`

`set_warehouse`

`list_tables`

`get_lakehouse_table_schema`

`get_all_lakehouse_schemas`

`set_table`

`get_sql_endpoint`

`run_query`

`load_data_from_url`

`list_reports`

`get_report`

`list_semantic_models`

`get_semantic_model`

`list_notebooks`

`get_notebook_content`

`update_notebook_cell`

`create_pyspark_notebook`

`create_fabric_notebook`

`generate_pyspark_code`

`generate_fabric_code`

`validate_pyspark_code`

`validate_fabric_code`

`analyze_notebook_performance`

`clear_context`