column_types in manifest.json) so WHERE, IN, and BETWEEN compare values numerically or chronologically instead of lexically.
When types are declared, new Parquet segments use a hybrid native layout: typed Arrow columns for declared fields plus metadata_json for any other keys.
Declare types at create time
text, int, float, bool, date, timestamp, json, uuid, vector (with optional dimension, e.g. vector(384) or vector[384]).
Vector dimensions are stored in manifest.json as vector:384 (bare vector when no dimension is given).
Fix legacy tables
REWRITE updates the manifest and runs COMPACT TABLE … FULL so existing segments are rewritten to the native hybrid layout. Without REWRITE, the manifest updates immediately but legacy segments stay on disk until you compact:
manifest.json sidecar (column_types).
Compound filters
Combine predicates withAND / OR and parentheses:
Filter behavior
| Column type | Example | Behavior |
|---|---|---|
int | WHERE rank > 9 | 10 and 100 match; 9 does not |
int | WHERE rank IN (9, 10) | Typed equality, not string match |
date | WHERE published >= '2024-01-01' | Chronological compare |
| untyped | WHERE score > 10 | Legacy numeric heuristic on strings |
- Analytics
SELECTwithGROUP BY/ aggregates - Retrieval
SELECTwithSPARSE SEARCH/VECTOR SEARCHand aWHEREclause (post-filter on candidates)
Introspection
column_types when present.
On-disk layout (native hybrid)
Segment columns (whencolumn_types is non-empty, schema_version ≥ 4):
| Column | Role |
|---|---|
id, text, embedding | Document identity and body (unchanged) |
Declared fields (e.g. rank, published) | Native Parquet types (Int64, Date32, …) |
metadata_json | Overflow metadata not listed in CREATE TABLE |
