Enabling provenance
Passexplain=True to Table.search:
Provenance record schema
Retrieval tiers
| Tier | What happens |
|---|---|
| Tier 1 | BM25 sparse candidates + HNSW/DiskANN dense candidates are gathered independently |
| Tier 2 | Candidates are merged via RRF (Reciprocal Rank Fusion); CRAG filtering and budget cuts apply |
| Tier 3 | Final top-k selection; quantization re-scoring if TurboQuant sidecars are present |
Drop stages
| Stage | Meaning |
|---|---|
metadata_filter | Excluded by a WHERE clause predicate |
tier1_budget_cut | Fell below Tier 1 candidate budget (tier1_budget × 50) |
tier2_budget_cut | Fell below Tier 2 candidate budget |
crag_filter | Removed by CRAG median score filter |
tier3_budget_cut | Fell below final top-k budget |
Common debugging patterns
Find a specific document’s fate
Compare BM25 vs HNSW agreement
Measure tier latency (future)
latency_us on each TierTrace is reserved for per-tier timing — currently emitted as 0 and will be populated in a future release. Use total_latency_ms for end-to-end wall time.
Persistent search log
Every search withexplain=True appends a JSON record to <db>/<table>/_search_log.ndjson. This file is newline-delimited JSON (one record per line).
SQL provenance
EXPLAIN on a retrieval query in SQL executes the search and returns the provenance JSON as the explanation text:
COUNT(*), GROUP BY), EXPLAIN still returns a plan string rather than provenance.