Skip to main content

When to reindex

Reindex after:
  • Bulk ingest of many documents
  • Changing embedding dimensions or index type
  • Search misses results you expect from a fresh index

Python / SQL

db.sql("CREATE INDEX text_idx ON articles (text) USING BM25")
db.sql("CREATE INDEX ann_idx ON papers (embedding) USING DISKANN")
Or:
db.reindex("articles", using="BM25")
db.reindex("emb", using="DISKANN", column="embedding")

CLI

toradb reindex ./my_db articles --using BM25 --column text
toradb reindex ./my_db emb --using DISKANN --column embedding

Index types

USINGUse for
BM25Sparse lexical search
HNSWIn-memory dense graph
DISKANNOn-disk ANN for large embedding tables
DiskANN graphs need enough embedded documents to build (typically 32+ rows with valid embedding vectors). Smaller tables may fall back or skip graph build.

Compact segments

Merge small Parquet segments to reduce file count:
COMPACT TABLE articles
COMPACT TABLE articles FULL
db.sql("COMPACT TABLE articles FULL")
Optional environment tuning (advanced):
  • TORADB_COMPACT_MAX_SEGMENTS
  • TORADB_COMPACT_MIN_MERGE
  • TORADB_COMPACT_SMALL_BYTES
After compacting, run DESCRIBE table to confirm segment counts. See On-disk layout.