Typed metadata columns

ToraDB attaches a type manifest per table (column_types in manifest.json) so WHERE, IN, and BETWEEN compare values numerically or chronologically instead of lexically. When types are declared, new Parquet segments use a hybrid native layout: typed Arrow columns for declared fields plus metadata_json for any other keys.

Declare types at create time

CREATE TABLE articles (id uuid, title text, year int, published date) USING text

Python:

db.sql(
    "CREATE TABLE articles (year int, published date) USING text"
)
# or
db.create_table(
    "articles",
    mode="text",
    schema={"year": "int", "published": "date"},
)

Supported type names: text, int, float, bool, date, timestamp, json, uuid, vector (with optional dimension, e.g. vector(384) or vector[384]). Vector dimensions are stored in manifest.json as vector:384 (bare vector when no dimension is given).

Fix legacy tables

ALTER TABLE articles ALTER COLUMN year TYPE int
ALTER TABLE papers ALTER COLUMN embedding TYPE vector(384) REWRITE

REWRITE updates the manifest and runs COMPACT TABLE … FULL so existing segments are rewritten to the native hybrid layout. Without REWRITE, the manifest updates immediately but legacy segments stay on disk until you compact:

COMPACT TABLE articles FULL

Types are persisted in the table manifest.json sidecar (column_types).

Compound filters

Combine predicates with AND / OR and parentheses:

SELECT slot, COUNT(*) FROM docs
WHERE rank > 9 AND slot = 'A'
GROUP BY slot

SELECT id FROM docs
WHERE (rank > 9 OR rank < 2) AND slot = 'science'
LIMIT 20

Filter behavior

Column type	Example	Behavior
`int`	`WHERE rank > 9`	`10` and `100` match; `9` does not
`int`	`WHERE rank IN (9, 10)`	Typed equality, not string match
`date`	`WHERE published >= '2024-01-01'`	Chronological compare
untyped	`WHERE score > 10`	Legacy numeric heuristic on strings

Typed filters apply to:

Analytics SELECT with GROUP BY / aggregates
Retrieval SELECT with SPARSE SEARCH / VECTOR SEARCH and a WHERE clause (post-filter on candidates)

Introspection

DESCRIBE articles
SHOW CREATE TABLE articles

Both include declared column_types when present.

On-disk layout (native hybrid)

Segment columns (when column_types is non-empty, schema_version ≥ 4):

Column	Role
`id`, `text`, `embedding`	Document identity and body (unchanged)
Declared fields (e.g. `rank`, `published`)	Native Parquet types (`Int64`, `Date32`, …)
`metadata_json`	Overflow metadata not listed in `CREATE TABLE`

Legacy segments (four columns only) remain readable. Migrate them with:

COMPACT TABLE articles FULL

After compact, declared fields are stored natively; filters and scans use the same SQL as before.

​Declare types at create time

​Fix legacy tables

​Compound filters

​Filter behavior

​Introspection

​On-disk layout (native hybrid)