SQL retrieval - ToraDB

Sparse search

SELECT id FROM articles
SPARSE SEARCH body BM25('Nikola Tesla alternating current')
LIMIT 3

Vector search

SELECT id FROM papers
VECTOR SEARCH embedding ANN([0.9, 0.1, 0.0, 0.0])
LIMIT 2

ANN(...) accepts a float list or a text literal for lexical proxy embedding when no vector is stored.

BM25 tuning

Override the BM25 k1 and b parameters per query:

SELECT id FROM articles
SPARSE SEARCH body BM25('Nikola Tesla', k1=1.5, b=0.6)
LIMIT 5

k1 controls term-frequency saturation; b controls length normalization. Both are optional and default to the index settings.

Combined sparse + vector

SELECT id FROM papers
SPARSE SEARCH title BM25('Tesla coil')
VECTOR SEARCH embedding ANN([0.9, 0.1, 0.0, 0.0])
LIMIT 5

ORDER BY score

SELECT id FROM docs
SPARSE SEARCH body BM25('Nikola Tesla motor')
ORDER BY score DESC
LIMIT 3

ORDER BY score ASC is also supported. You can also order by a metadata column or a scalar function — see Functions & aliases.

Ranking knobs

Re-rank retrieved candidates after the base relevance score with field boosts and temporal decay. Both widen the candidate window so the re-rank covers the full result set, not just the page.

BOOST

BOOST(field, factor) multiplies a document’s score by factor when field is present and non-empty. Repeat the clause for multiple fields (boosts multiply).

SELECT id FROM articles
SPARSE SEARCH body BM25('Nikola Tesla')
BOOST(title, 2.0)
BOOST(verified, 1.5)
LIMIT 10

DECAY

DECAY(field, half_life=days) applies exponential time decay based on a timestamp field: score *= 0.5 ^ (age_days / half_life_days). The half_life keyword is optional.

SELECT id FROM articles
SPARSE SEARCH body BM25('Nikola Tesla')
DECAY(published_at, half_life=30)
LIMIT 10

Combine boosts and decay freely. With EXPLAIN, the per-document score breakdown (base / boost / decay / final) appears in the provenance trace. FACETS(col1, col2, ...) returns per-value counts over the full matched set (not just the returned page). Counts are exposed as facets on the result (SearchResults in Python).

SELECT id FROM articles
SPARSE SEARCH body BM25('Nikola Tesla')
FACETS(tag, author)
LIMIT 10

Each facet field returns up to the top 20 values by count (ties broken by value).

HIGHLIGHT

HIGHLIGHT(len) adds a snippet column with the matched query terms wrapped in <em>...</em>, truncated to len characters.

SELECT id FROM articles
SPARSE SEARCH body BM25('alternating current')
HIGHLIGHT(160)
LIMIT 5

OFFSET / LIMIT

SELECT id FROM docs
SPARSE SEARCH body BM25('Nikola Tesla')
LIMIT 2 OFFSET 2

EXPLAIN

EXPLAIN SELECT id FROM docs
SPARSE SEARCH body BM25('Tesla')
LIMIT 5

EXPLAIN on a retrieval query executes the search and returns results alongside a structured JSON provenance trace in explain_text. The trace shows which documents were considered and dropped at each retrieval tier.

result = db.sql("""
    EXPLAIN SELECT id FROM docs
    SPARSE SEARCH body BM25('Tesla')
    LIMIT 5
""")
# result.explain_text contains the JSON provenance record
import json
prov = json.loads(result.explain_text)
print(prov["tier1"]["bm25_candidates"])
print(prov["tier2"]["drops"])

For non-retrieval queries (analytics, COUNT(*)), EXPLAIN still returns a plan string without executing. Cannot combine with STREAM. See Retrieval provenance for the full provenance schema.

DISTRIBUTED

SELECT id FROM docs
DISTRIBUTED SPARSE SEARCH body BM25('motor')
LIMIT 10

Scans segment shards in parallel on a single node.

STREAM (Python)

Use Database.sql_stream or toradb.sql.sql_stream to page results. Not a separate SQL keyword in all clients; the engine honors LIMIT/OFFSET per page.

​Sparse search

​Vector search

​BM25 tuning

​Combined sparse + vector

​ORDER BY score

​Ranking knobs

​BOOST

​DECAY

​Facets

​HIGHLIGHT

​OFFSET / LIMIT

​EXPLAIN

​DISTRIBUTED

​STREAM (Python)