Introduction - ToraDB

ToraDB is a retrieval-focused database for local workloads: on-disk tables, BM25 sparse search, dense ANN (HNSW / DiskANN), hybrid fusion, plus a Python SDK, CLI, and retrieval SQL. The defining feature: ToraDB is the only retrieval database that shows its work. Every search can return a structured provenance trace — which documents were considered at each tier, why they were kept or dropped, and where time was spent. This turns RAG debugging from guesswork into a precise, auditable workflow.

What you can do

Store documents locally with Parquet segments and index sidecars
Run text, vector, and hybrid search from Python or SQL
Inspect retrieval provenance: per-tier candidate lists, drop reasons, latency
Aggregate with GROUP BY, build indexes, and compact segments on disk

Get started

Install

pip install toradb from PyPI, or build from source.

Quickstart

Create a table, ingest docs, and run your first query.

Provenance

See exactly why each document was returned — or why it wasn’t.

API reference

Python SDK, SQL grammar, and CLI commands.

Example

import json
import toradb

db = toradb.local("./my_db")
docs = db.create_table("docs", mode="text")
docs.add([
    "Nikola Tesla invented the alternating current induction motor",
    "Marie Curie studied radioactivity",
])

# Search with provenance — see exactly what happened inside
results = docs.search("Tesla motor", top_k=3, explain=True)
print(results.to_pandas())

prov = json.loads(results.provenance)
print(f"BM25 found: {len(prov['tier1']['bm25_candidates'])} candidates")
print(f"Dropped:    {[d['reason'] for d in prov['tier2']['drops']]}")
print(f"Latency:    {prov['total_latency_ms']:.1f}ms")

License

ToraDB is licensed under the Apache License, Version 2.0.

Install

⌘I

​What you can do

​Get started

Install

Quickstart

Provenance

API reference

​Example

​License

What you can do

Get started

Example

License