Skip to main content
A ToraDB database is a directory you pass to toradb.local("./my_db"). Inside, each table has Parquet segments and optional index sidecars.

Inspect with DESCRIBE

toradb sql ./examples/_demo_db "DESCRIBE articles"
Typical output includes:
  • Row count and vector dimension (if any)
  • Segment count and paths
  • Sidecars: bm25, vectors, hnsw, diskann, segment_*

Components

ComponentPurpose
Parquet segmentsColumnar document storage per flush (id, text, optional typed metadata columns, metadata_json, embedding)
WALReplay safety for recent writes
bm25 sidecarSparse lexical index (TBM3 mmap-friendly *.bm25.bin)
{seg}.bm25.lex.binPer-segment term lexicon for query pruning
bm25.route.binTable-level term → segment routing index
hnsw sidecarIn-memory-style dense graph index
diskann sidecarOn-disk ANN graph for large embedding tables
Materialized viewsCached search/analytics results (see SQL materialized views)

Persistence model

  1. Ingest — documents append to the active segment; indexes update incrementally where supported.
  2. Flush — segments are written as Parquet under the table directory.
  3. Reindex — rebuild a specific index type (BM25, HNSW, DISKANN) after bulk loads or schema changes.
  4. Compact — merge small segments to reduce file count (COMPACT TABLE; see Reindex and compact). COMPACT TABLE … FULL also migrates legacy JSON-only segments to native typed metadata columns.

Segment-only bulk ingest

Large tables use index_mode: segment_only in manifest.json: per-segment *.bm25.bin (TBM3) and *.bm25.lex.bin lexicons are built at finish; optional query_mode: routed uses bm25.route.bin to skip irrelevant segments at query time. Environment variables for the demo and CLI:
VariablePurpose
TORADB_CACHE_INDEX_BYTESFixed LRU budget for mmap segment BM25 blobs
TORADB_CACHE_AUTOSet 1 to size cache from RAM fraction (serving profile)
TORADB_CACHE_INDEX_FRACTIONRAM fraction for auto cache (default 0.20)
TORADB_CACHE_AUTO_CAP_BYTESMax auto cache bytes (default 8GB)
TORADB_WARMUP_ON_STARTBackground search warmup after API start (1 in serving demo)
TORADB_INGEST_THREADSParallelism for index build
TORADB_PROFILEDemo preset: ingest or serving (demo/run.sh)
See Production serving profiles for ingest vs serving presets.
Run examples/full_example.py once to populate examples/_demo_db, then use DESCRIBE on each table to see real sidecar names.