Skip to main content
ToraDB attaches a type manifest per table (column_types in manifest.json) so WHERE, IN, and BETWEEN compare values numerically or chronologically instead of lexically. When types are declared, new Parquet segments use a hybrid native layout: typed Arrow columns for declared fields plus metadata_json for any other keys.

Declare types at create time

CREATE TABLE articles (id uuid, title text, year int, published date) USING text
Python:
db.sql(
    "CREATE TABLE articles (year int, published date) USING text"
)
# or
db.create_table(
    "articles",
    mode="text",
    schema={"year": "int", "published": "date"},
)
Supported type names: text, int, float, bool, date, timestamp, json, uuid, vector (with optional dimension, e.g. vector(384) or vector[384]). Vector dimensions are stored in manifest.json as vector:384 (bare vector when no dimension is given).

Fix legacy tables

ALTER TABLE articles ALTER COLUMN year TYPE int
ALTER TABLE papers ALTER COLUMN embedding TYPE vector(384) REWRITE
REWRITE updates the manifest and runs COMPACT TABLE … FULL so existing segments are rewritten to the native hybrid layout. Without REWRITE, the manifest updates immediately but legacy segments stay on disk until you compact:
COMPACT TABLE articles FULL
Types are persisted in the table manifest.json sidecar (column_types).

Compound filters

Combine predicates with AND / OR and parentheses:
SELECT slot, COUNT(*) FROM docs
WHERE rank > 9 AND slot = 'A'
GROUP BY slot

SELECT id FROM docs
WHERE (rank > 9 OR rank < 2) AND slot = 'science'
LIMIT 20

Filter behavior

Column typeExampleBehavior
intWHERE rank > 910 and 100 match; 9 does not
intWHERE rank IN (9, 10)Typed equality, not string match
dateWHERE published >= '2024-01-01'Chronological compare
untypedWHERE score > 10Legacy numeric heuristic on strings
Typed filters apply to:
  • Analytics SELECT with GROUP BY / aggregates
  • Retrieval SELECT with SPARSE SEARCH / VECTOR SEARCH and a WHERE clause (post-filter on candidates)

Introspection

DESCRIBE articles
SHOW CREATE TABLE articles
Both include declared column_types when present.

On-disk layout (native hybrid)

Segment columns (when column_types is non-empty, schema_version ≥ 4):
ColumnRole
id, text, embeddingDocument identity and body (unchanged)
Declared fields (e.g. rank, published)Native Parquet types (Int64, Date32, …)
metadata_jsonOverflow metadata not listed in CREATE TABLE
Legacy segments (four columns only) remain readable. Migrate them with:
COMPACT TABLE articles FULL
After compact, declared fields are stored natively; filters and scans use the same SQL as before.