by pingcap
Provides a unified data platform for AI applications, enabling vector, full‑text, and hybrid search directly on TiDB, with automatic embedding of text and images, advanced filtering, reranking, and full transaction support.
TiDB Python AI SDK empowers developers to build AI‑enabled applications on TiDB by offering seamless integration of vector, full‑text, and hybrid search capabilities. It automatically embeds textual and visual data, stores embeddings as vector fields, and supports complex queries with filters, rerankers, and transactional guarantees.
pip install pytidb
pip install "pytidb[models]"
TiDBClient.connect
.tidb_client.configure_embedding_provider
.TableModel
, specifying text fields and vector fields generated by EmbeddingFunction
.table.search(query, search_type=...)
with options for vector
, fulltext
, or hybrid
. Results can be returned as lists, Pydantic models, pandas DataFrames, etc.tidb_client.session()
for atomic operations.$eq
, $gt
, $in
, $and
, $or
, …).Q: Do I need to host my own embedding model? A: No. The SDK can call external providers such as OpenAI or Jina via API keys. You can also plug in custom models.
Q: Is the API stable?
A: The package is under rapid development; pin a specific version (e.g., pytidb==0.0.12
) for production.
Q: Can I use the SDK with a self‑hosted TiDB cluster? A: Yes. Provide the host, port, username, password, and database name when connecting.
Q: How are transactions handled?
A: Use tidb_client.session()
as a context manager to execute multiple statements atomically, then commit()
or rollback()
.
Q: What output formats are supported for search results?
A: Lists of dictionaries, Pydantic model instances, pandas DataFrames, or plain Python objects via .to_list()
, .to_pydantic()
, .to_pandas()
, etc.
Python SDK for TiDB AI: A unified data platform empowering developers to build next-generation AI applications.
[!NOTE] This Python package is under rapid development and its API may change. It is recommended to use a fixed version when installing, e.g.,
pytidb==0.0.12
.
pip install pytidb
# To use built-in embedding functions and rerankers:
pip install "pytidb[models]"
# To convert query results to pandas DataFrame:
pip install pandas
Create a free TiDB cluster at tidbcloud.com.
import os
from pytidb import TiDBClient
tidb_client = TiDBClient.connect(
host=os.getenv("TIDB_HOST"),
port=int(os.getenv("TIDB_PORT")),
username=os.getenv("TIDB_USERNAME"),
password=os.getenv("TIDB_PASSWORD"),
database=os.getenv("TIDB_DATABASE"),
ensure_db=True,
)
PyTiDB automatically embeds text fields (e.g., text
) and stores the vector embedding in a vector field (e.g., text_vec
).
Create a table with an embedding function:
from pytidb.schema import TableModel, Field, FullTextField
from pytidb.embeddings import EmbeddingFunction
# Set API key for embedding provider.
tidb_client.configure_embedding_provider("openai", api_key=os.getenv("OPENAI_API_KEY"))
class Chunk(TableModel):
__tablename__ = "chunks"
id: int = Field(primary_key=True)
text: str = FullTextField()
text_vec: list[float] = EmbeddingFunction(
"openai/text-embedding-3-small"
).VectorField(source_field="text") # 👈 Defines the vector field.
user_id: int = Field()
table = tidb_client.create_table(schema=Chunk, if_exists="skip")
Bulk insert data:
table.bulk_insert([
Chunk(id=2, text="bar", user_id=2), # 👈 The text field is embedded and saved to text_vec automatically.
Chunk(id=3, text="baz", user_id=3),
Chunk(id=4, text="qux", user_id=4),
])
Vector Search
Vector search finds the most relevant records based on semantic similarity, so you don't need to include all keywords explicitly in your query.
df = (
table.search("<query>") # 👈 The query is embedded automatically.
.filter({"user_id": 2})
.limit(2)
.to_list()
)
# Output: A list of dicts.
See the Vector Search example for more details.
Full-text Search
Full-text search tokenizes the query and finds the most relevant records by matching exact keywords.
df = (
table.search("<query>", search_type="fulltext")
.limit(2)
.to_pydantic()
)
# Output: A list of pydantic model instances.
See the Full-text Search example for more details.
Hybrid Search
Hybrid search combines exact matching from full-text search with semantic understanding from vector search, delivering more relevant and reliable results.
df = (
table.search("<query>", search_type="hybrid")
.limit(2)
.to_pandas()
)
# Output: A pandas DataFrame.
See the Hybrid Search example for more details.
Image Search
Image search lets you find visually similar images using natural language descriptions or another image as a reference.
from PIL import Image
from pytidb.schema import TableModel, Field
from pytidb.embeddings import EmbeddingFunction
# Define a multi-modal embedding model.
jina_embed_fn = EmbeddingFunction("jina_ai/jina-embeddings-v4") # Using multi-modal embedding model.
class Pet(TableModel):
__tablename__ = "pets"
id: int = Field(primary_key=True)
image_uri: str = Field()
image_vec: list[float] = jina_embed_fn.VectorField(
source_field="image_uri",
source_type="image"
)
table = tidb_client.create_table(schema=Pet, if_exists="skip")
# Insert sample images ...
table.insert(Pet(image_uri="path/to/shiba_inu_14.jpg"))
# Search for images using natural language
results = table.search("shiba inu dog").limit(1).to_list()
# Search for images using an image ...
query_image = Image.open("shiba_inu_15.jpg")
results = table.search(query_image).limit(1).to_pydantic()
See the Image Search example for more details.
PyTiDB supports a variety of operators for flexible filtering:
Operator | Description | Example |
---|---|---|
$eq |
Equal to | {"field": {"$eq": "hello"}} |
$gt |
Greater than | {"field": {"$gt": 1}} |
$gte |
Greater than or equal | {"field": {"$gte": 1}} |
$lt |
Less than | {"field": {"$lt": 1}} |
$lte |
Less than or equal | {"field": {"$lte": 1}} |
$in |
In array | {"field": {"$in": [1, 2, 3]}} |
$nin |
Not in array | {"field": {"$nin": [1, 2, 3]}} |
$and |
Logical AND | {"$and": [{"field1": 1}, {"field2": 2}]} |
$or |
Logical OR | {"$or": [{"field1": 1}, {"field2": 2}]} |
from pytidb import Session
from pytidb.sql import select
# Create a table to store user data:
class User(TableModel):
__tablename__ = "users"
id: int = Field(primary_key=True)
name: str = Field(max_length=20)
# Use the db_engine from TiDBClient when creating a Session
with Session(tidb_client.db_engine) as session:
query = (
select(Chunk).join(User, Chunk.user_id == User.id).where(User.name == "Alice")
)
chunks = session.exec(query).all()
[(c.id, c.text, c.user_id) for c in chunks]
PyTiDB supports transaction management, helping you avoid race conditions and ensure data consistency.
with tidb_client.session() as session:
initial_total_balance = tidb_client.query("SELECT SUM(balance) FROM players").scalar()
# Transfer 10 coins from player 1 to player 2
tidb_client.execute("UPDATE players SET balance = balance - 10 WHERE id = 1")
tidb_client.execute("UPDATE players SET balance = balance + 10 WHERE id = 2")
session.commit()
# or session.rollback()
final_total_balance = tidb_client.query("SELECT SUM(balance) FROM players").scalar()
assert final_total_balance == initial_total_balance
[!TIP] Click the button below to install TiDB MCP Server in Cursor. Then, confirm by clicking Install when prompted.
Please log in to share your review and rating for this MCP.
Discover more MCP servers with similar functionality and use cases
by googleapis
Provides a configurable MCP server that abstracts connection pooling, authentication, observability, and tool management to accelerate development of database‑backed AI tools.
by bytebase
DBHub is a universal database gateway that implements the Model Context Protocol (MCP) server interface, enabling MCP-compatible clients to interact with various databases.
by neo4j-contrib
Provides Model Context Protocol servers for interacting with Neo4j databases, managing Aura instances, and handling personal knowledge graph memory through natural‑language interfaces.
by mongodb-js
Provides a Model Context Protocol server that connects to MongoDB databases and Atlas clusters, exposing a rich set of tools for querying, managing, and administering data and infrastructure.
by benborla
A Model Context Protocol (MCP) server that provides read-only access to MySQL databases, enabling Large Language Models (LLMs) to inspect database schemas and execute read-only queries.
by ClickHouse
Provides tools that let AI assistants run read‑only SQL queries against ClickHouse clusters or the embedded chDB engine, plus a health‑check endpoint for service monitoring.
by elastic
Provides direct, natural‑language access to Elasticsearch indices via the Model Context Protocol, allowing AI agents to query and explore data without writing DSL.
by motherduckdb
Provides an MCP server that enables SQL analytics on DuckDB and MotherDuck databases, allowing AI assistants and IDEs to execute queries via a unified interface.
by redis
Provides a natural language interface for agentic applications to manage and search data in Redis efficiently.