Skip to main content
ToraDB is a retrieval-focused database for local workloads: on-disk tables, BM25 sparse search, dense ANN (HNSW / DiskANN), hybrid fusion, plus a Python SDK, CLI, and retrieval SQL. The defining feature: ToraDB is the only retrieval database that shows its work. Every search can return a structured provenance trace — which documents were considered at each tier, why they were kept or dropped, and where time was spent. This turns RAG debugging from guesswork into a precise, auditable workflow.

What you can do

  • Store documents locally with Parquet segments and index sidecars
  • Run text, vector, and hybrid search from Python or SQL
  • Inspect retrieval provenance: per-tier candidate lists, drop reasons, latency
  • Aggregate with GROUP BY, build indexes, and compact segments on disk

Get started

Install

pip install toradb from PyPI, or build from source.

Quickstart

Create a table, ingest docs, and run your first query.

Provenance

See exactly why each document was returned — or why it wasn’t.

API reference

Python SDK, SQL grammar, and CLI commands.

Example

import json
import toradb

db = toradb.local("./my_db")
docs = db.create_table("docs", mode="text")
docs.add([
    "Nikola Tesla invented the alternating current induction motor",
    "Marie Curie studied radioactivity",
])

# Search with provenance — see exactly what happened inside
results = docs.search("Tesla motor", top_k=3, explain=True)
print(results.to_pandas())

prov = json.loads(results.provenance)
print(f"BM25 found: {len(prov['tier1']['bm25_candidates'])} candidates")
print(f"Dropped:    {[d['reason'] for d in prov['tier2']['drops']]}")
print(f"Latency:    {prov['total_latency_ms']:.1f}ms")

License

ToraDB is licensed under the Apache License, Version 2.0.