TiDB Python AI SDK

What is TiDB Python AI SDK about?

TiDB Python AI SDK empowers developers to build AI‑enabled applications on TiDB by offering seamless integration of vector, full‑text, and hybrid search capabilities. It automatically embeds textual and visual data, stores embeddings as vector fields, and supports complex queries with filters, rerankers, and transactional guarantees.

How to use TiDB Python AI SDK?

Install the package (optionally with built‑in models):
```
pip install pytidb
pip install "pytidb[models]"
```
Connect to a TiDB cluster (TiDB Cloud or self‑hosted) using TiDBClient.connect.
Configure an embedding provider, e.g., OpenAI or Jina, via tidb_client.configure_embedding_provider.
Define table schemas with TableModel, specifying text fields and vector fields generated by EmbeddingFunction.
Insert data; the SDK will auto‑embed source fields into vector columns.
Search using table.search(query, search_type=...) with options for vector, fulltext, or hybrid. Results can be returned as lists, Pydantic models, pandas DataFrames, etc.
Leverage transactions via tidb_client.session() for atomic operations.

Key features of TiDB Python AI SDK

Unified search modes: vector, full‑text, hybrid.
Automatic embedding for text and images (multi‑modal).
Built‑in support for popular embedding models (OpenAI, Jina, etc.).
Advanced filtering operators ($eq, $gt, $in, $and, $or, …).
Optional reranking models to refine result relevance.
Full transaction management (commit/rollback).
Seamless join between structured and unstructured data using standard SQLAlchemy‑style queries.
Extensions for built‑in MCP server integration.

Use cases of TiDB Python AI SDK

Semantic document retrieval: Store and query large corpora of articles, FAQs, or knowledge bases using vector similarity.
Image search: Find visually similar images via natural language descriptions or reference images.
Hybrid recommendation: Combine exact keyword matches with semantic similarity for more precise recommendations.
RAG pipelines: Retrieve relevant chunks for large‑language‑model generation directly from TiDB.
Enterprise data fusion: Join user data tables with unstructured content to create contextualized experiences.

FAQ from the TiDB Python AI SDK

Q: Do I need to host my own embedding model? A: No. The SDK can call external providers such as OpenAI or Jina via API keys. You can also plug in custom models.

Q: Is the API stable? A: The package is under rapid development; pin a specific version (e.g., pytidb==0.0.12) for production.

Q: Can I use the SDK with a self‑hosted TiDB cluster? A: Yes. Provide the host, port, username, password, and database name when connecting.

Q: How are transactions handled? A: Use tidb_client.session() as a context manager to execute multiple statements atomically, then commit() or rollback().

Q: What output formats are supported for search results? A: Lists of dictionaries, Pydantic model instances, pandas DataFrames, or plain Python objects via .to_list(), .to_pydantic(), .to_pandas(), etc.

Introduction

Python SDK for TiDB AI: A unified data platform empowering developers to build next-generation AI applications.

🔍 Unified Search Modes: Vector · Full‑Text · Hybrid
🎭 Auto‑Embedding & Multi‑Modal Storage: Support for text, images, and more
🖼️ Image Search Support: Text‑to‑image and image‑to‑image retrieval capabilities
🎯 Advanced Filtering & Reranking: Flexible filters with optional reranker models to fine-tune result relevance
💱 Transaction Support: Full transaction management including commit/rollback to ensure consistency

Installation

[!NOTE] This Python package is under rapid development and its API may change. It is recommended to use a fixed version when installing, e.g., pytidb==0.0.12.

pip install pytidb

# To use built-in embedding functions and rerankers:
pip install "pytidb[models]"

# To convert query results to pandas DataFrame:
pip install pandas

Connect to TiDB Cloud

Create a free TiDB cluster at tidbcloud.com.

import os
from pytidb import TiDBClient

tidb_client = TiDBClient.connect(
    host=os.getenv("TIDB_HOST"),
    port=int(os.getenv("TIDB_PORT")),
    username=os.getenv("TIDB_USERNAME"),
    password=os.getenv("TIDB_PASSWORD"),
    database=os.getenv("TIDB_DATABASE"),
    ensure_db=True,
)

Highlights

🤖 Automatic Embedding

PyTiDB automatically embeds text fields (e.g., text) and stores the vector embedding in a vector field (e.g., text_vec).

Create a table with an embedding function:

from pytidb.schema import TableModel, Field, FullTextField
from pytidb.embeddings import EmbeddingFunction

# Set API key for embedding provider.
tidb_client.configure_embedding_provider("openai", api_key=os.getenv("OPENAI_API_KEY"))

class Chunk(TableModel):
    __tablename__ = "chunks"

    id: int = Field(primary_key=True)
    text: str = FullTextField()
    text_vec: list[float] = EmbeddingFunction(
        "openai/text-embedding-3-small"
    ).VectorField(source_field="text")  # 👈 Defines the vector field.
    user_id: int = Field()

table = tidb_client.create_table(schema=Chunk, if_exists="skip")

Bulk insert data:

table.bulk_insert([
    Chunk(id=2, text="bar", user_id=2),   # 👈 The text field is embedded and saved to text_vec automatically.
    Chunk(id=3, text="baz", user_id=3),
    Chunk(id=4, text="qux", user_id=4),
])

🔍 Search

Vector Search

Vector search finds the most relevant records based on semantic similarity, so you don't need to include all keywords explicitly in your query.

df = (
  table.search("<query>")  # 👈 The query is embedded automatically.
    .filter({"user_id": 2})
    .limit(2)
    .to_list()
)
# Output: A list of dicts.

See the Vector Search example for more details.

Full-text Search

Full-text search tokenizes the query and finds the most relevant records by matching exact keywords.

df = (
  table.search("<query>", search_type="fulltext")
    .limit(2)
    .to_pydantic()
)
# Output: A list of pydantic model instances.

See the Full-text Search example for more details.

Hybrid Search

Hybrid search combines exact matching from full-text search with semantic understanding from vector search, delivering more relevant and reliable results.

df = (
  table.search("<query>", search_type="hybrid")
    .limit(2)
    .to_pandas()
)
# Output: A pandas DataFrame.

See the Hybrid Search example for more details.

Image Search

Image search lets you find visually similar images using natural language descriptions or another image as a reference.

from PIL import Image
from pytidb.schema import TableModel, Field
from pytidb.embeddings import EmbeddingFunction

# Define a multi-modal embedding model.
jina_embed_fn = EmbeddingFunction("jina_ai/jina-embeddings-v4")  # Using multi-modal embedding model.

class Pet(TableModel):
    __tablename__ = "pets"
    id: int = Field(primary_key=True)
    image_uri: str = Field()
    image_vec: list[float] = jina_embed_fn.VectorField(
        source_field="image_uri",
        source_type="image"
    )

table = tidb_client.create_table(schema=Pet, if_exists="skip")

# Insert sample images ...
table.insert(Pet(image_uri="path/to/shiba_inu_14.jpg"))

# Search for images using natural language
results = table.search("shiba inu dog").limit(1).to_list()

# Search for images using an image ...
query_image = Image.open("shiba_inu_15.jpg")
results = table.search(query_image).limit(1).to_pydantic()

See the Image Search example for more details.

Advanced Filtering

PyTiDB supports a variety of operators for flexible filtering:

Operator	Description	Example
`$eq`	Equal to	`{"field": {"$eq": "hello"}}`
`$gt`	Greater than	`{"field": {"$gt": 1}}`
`$gte`	Greater than or equal	`{"field": {"$gte": 1}}`
`$lt`	Less than	`{"field": {"$lt": 1}}`
`$lte`	Less than or equal	`{"field": {"$lte": 1}}`
`$in`	In array	`{"field": {"$in": [1, 2, 3]}}`
`$nin`	Not in array	`{"field": {"$nin": [1, 2, 3]}}`
`$and`	Logical AND	`{"$and": [{"field1": 1}, {"field2": 2}]}`
`$or`	Logical OR	`{"$or": [{"field1": 1}, {"field2": 2}]}`

⛓ Join Structured and Unstructured Data

from pytidb import Session
from pytidb.sql import select

# Create a table to store user data:
class User(TableModel):
    __tablename__ = "users"
    id: int = Field(primary_key=True)
    name: str = Field(max_length=20)

# Use the db_engine from TiDBClient when creating a Session
with Session(tidb_client.db_engine) as session:
    query = (
        select(Chunk).join(User, Chunk.user_id == User.id).where(User.name == "Alice")
    )
    chunks = session.exec(query).all()

[(c.id, c.text, c.user_id) for c in chunks]

💱 Transaction Support

PyTiDB supports transaction management, helping you avoid race conditions and ensure data consistency.

with tidb_client.session() as session:
    initial_total_balance = tidb_client.query("SELECT SUM(balance) FROM players").scalar()

    # Transfer 10 coins from player 1 to player 2
    tidb_client.execute("UPDATE players SET balance = balance - 10 WHERE id = 1")
    tidb_client.execute("UPDATE players SET balance = balance + 10 WHERE id = 2")

    session.commit()
    # or session.rollback()

    final_total_balance = tidb_client.query("SELECT SUM(balance) FROM players").scalar()
    assert final_total_balance == initial_total_balance

Extensions

🔌 Built-in MCP support

[!TIP] Click the button below to install TiDB MCP Server in Cursor. Then, confirm by clicking Install when prompted.