Jammi AI

Jammi is an embeddable AI engine that brings model inference into your data pipeline. Register data sources, run SQL queries, generate embeddings, search with vector similarity, fine-tune models on your domain, and evaluate results — all without leaving your application.

What Jammi does

Query local data with SQL — register Parquet, CSV, and JSON files, run full SQL via DataFusion
Federate external databases — query PostgreSQL and MySQL alongside local files
Generate embeddings — load any BERT-family model from HuggingFace Hub (or local safetensors / ONNX), persist results to Parquet with sidecar ANN indexes
Vector search — ANN similarity search over embedding tables with automatic fallback to brute-force; search returns a table directly, same shape embedded or remote
Compound query — join sources, filter, and run a model over a relation (the annotate SQL table function), in-process or over the Flight SQL lane in one round-trip; a fluent QueryBuilder composes the same operations in Rust
Evidence provenance — retrieved_by and annotated_by tracking on the fluent query builder’s results
Fine-tuning — LoRA adapters with contrastive loss to improve embeddings for your domain
Evaluation — retrieval metrics (recall@k, precision@k, MRR, nDCG), classification (accuracy, F1), and A/B model comparison
Per-row error handling — null or invalid text produces error status per row, not a batch failure
Model caching — LRU eviction, ref-counted guards, single-flight loading
GPU scheduling — memory-budget admission control with RAII permits
Crash recovery — on restart, recovers result tables stuck in “building” state
Inference observability — attach observers to hook into every output batch

Three ways to use Jammi

Interface	Best for	Install
Rust library	Embedding Jammi into Rust applications	`cargo add jammi-ai`
Python package	Data science, notebooks, scripts	`pip install jammi-ai`
CLI	Shell workflows, quick queries, ops	`cargo install jammi-cli`

All three interfaces share the same engine, configuration, and storage format. Embeddings generated from Python are queryable from the CLI, and vice versa.

For multi-language access or BI tool integration, the jammi-server binary starts an Arrow Flight SQL server — any Arrow client can connect and query via standard SQL. The jammi CLI is a strict gRPC client of that server.

Crates

Crate	Purpose
`jammi-db`	Query engine, configuration, catalog, source management, Parquet storage, ANN indexes
`jammi-ai`	Model loading, inference execution, embedding pipeline, vector search, evidence model, fine-tuning, evaluation
`jammi-server`	Arrow Flight SQL server and HTTP health endpoint
`jammi-cli`	Command-line interface
`jammi-python`	Python bindings via PyO3

jammi-db has no dependency on jammi-ai. You can use it standalone for SQL queries over local data without pulling in the AI layer.

Design Philosophy

Jammi is an engine of generic primitives, not a substrate-platform. This page states what that means, where the line falls, and how the engine is meant to be consumed and deployed. It is principle-level on purpose — it pins no versions, index types, or other details that move, so it stays true as the implementation evolves.

The one rule everything else follows from

Jammi names no consumer. Not in code, config, docs, tests, fixtures, or scripts. References point one way only: a consumer may depend on Jammi; Jammi depends on no consumer. A consumer’s name anywhere in the engine repo is a bug.

A real consumer need is a fine forcing function for the roadmap — but the thing that lands in the engine is the generic primitive the need pointed at, with the name filed off. Two unrelated consumers independently reaching for the same primitive is the strongest evidence it is right; being able to justify a primitive only by naming one consumer is the strongest evidence it is wrong.

The discipline test

Before any capability enters the engine:

Would a user who has never heard of any particular consumer reach for this on its own?

Justify it against unrelated, hypothetical consumers — a feature store, an ad-attribution chain, a clinical-trial data fabric, a personal-knowledge search tool. If it survives only with a real name attached, it is domain pull masquerading as a primitive, and it belongs in that consumer’s own repo, built on a published Jammi version.

Where the line falls

Stays in Jammi (engine primitives)	Lives in the consumer’s repo (composition)
DataFusion SQL surface	Domain tables (the consumer’s own entities)
Catalog primitives — typed status enums, append-only migrations	Domain status enums and their lifecycle meaning
Storage primitives — Parquet result-tables, sidecar ANN, mutable companion tables	Domain audit / recapture / gating semantics
Source registration and federation	Domain interop adapters (foreign format → a registered source)
AI primitives — embeddings, inference, search, fine-tune, eval	Domain agents / reactors and what they decide
Data-driven provenance channels	Domain audit columns (what “signed-by”, “owned-by” mean)
Trigger stream — Arrow batches, SQL-predicate filters	Domain event taxonomies and their semantics
Tenant session scope	Domain ownership models — registries, ownership lanes, publish/install/bind
Server surfaces (Flight SQL, gRPC)	Domain operation contracts (typed verbs, signed transitions)

Left column: every cell is something a user with no knowledge of any specific consumer would still want. Right column: every cell is a composition a consumer builds in its own repo out of the left column. The substrate-platform shape — append-only typed substrate + pluggable reactors + read-back loops at decision moments — recurs across consumers, but it recurs in their domain layers, not here. If Jammi codified that shape, the next consumer, in a domain we have not met, would have to bend to fit or fork.

Leak-guards

Domain pull leaks in through “almost-generic” primitives that quietly assume a consumer’s semantics. Three guards, all the same shape — the primitive transports / persists / merges; the semantics live above it:

The trigger stream knows nothing about the payload. A topic is a name; a message is an Arrow batch; a subscription is a SQL predicate over batch columns. No typed-event taxonomy, no required headers (actor, timestamp, signature), no ordering guarantee beyond per-topic FIFO. Adding a “typed event” with mandatory headers is where it would break.
Mutable tables expose CRUD through DML, nothing more. No built-in transition log, no automatic versioning, no lifecycle-column convention. A consumer that wants append-only-with-history builds it from two table registrations. Adding a LifecycleTable wrapper is where it would break.
Provenance channels merge declared columns at query time. The engine never writes to a provenance column. What a channel column means — signed-by, retrieved-by, scored-by, attributed-to — is the caller’s vocabulary. Adding channel-writing helpers (record_actor(), sign_with()) is where it would break.

How embeddings are consumed: `search`

There is one consumption verb. The embedding producers differ only in the encoder; the moment vectors land in a result-table plus its sidecar index, consumption is identical: search(source, query, k) returns top-k ids and scores — ANN over the sidecar index, with an exact scan as the fallback when no index is present.

search is the curated path because it is where the engine adds value: the ANN index, the exact-scan fallback, and the evidence/provenance attached to results. The raw vector itself is a column in a SQL-addressable result-table, so it is reachable through the generic SQL surface (SELECT <vector_column> FROM <result_table> …) — the same as selecting any other column. That generic read path is legitimate, and making it ergonomic and documented is ordinary engine work; it is not a violation of anything here. It is the right home for the rare genuine need — exporting embeddings, a custom-metric re-rank, debugging.

What the engine does not add is a dedicated vector-retrieval verb on the embedding/search API. A consumer should not pull vectors back to reconstruct a ranking the engine already computes, or to compare across models (re-encode for that). A bespoke get_vector verb with no caller is speculative domain-convenience: it competes with search as “how you consume embeddings” and owes a per-id contract across every storage backend. So the line is precise — the capability is the SQL surface, not a verb; the verb is what fails the discipline test.

How it deploys: one binary, pluggable backends

The same engine binary serves every topology. Differences are configuration (which backend driver) and process count (1 vs N) — never a topology-specific code path or a server-only feature the library cannot do.

Four canonical shapes — points on a configuration surface, not tiers to graduate through:

A — Single-process embedded. Library mode, SQLite catalog, local Parquet, in-memory trigger stream, in-process model cache. Notebooks, CLI, single-machine, laptop dev.
B — Single-tenant server. One process, Flight SQL + gRPC trigger stream exposed, optional Postgres catalog for HA, local disk + object-store backup. Physical-isolation requirements, on-prem.
C — Multi-tenant server. One process (or stateless fleet), Postgres catalog, shared object store, shared trigger broker; tenant scope filters every catalog query. SaaS hosting many tenants — e.g. an edge-function deployment with the engine as a container sidecar reached over gRPC, or a managed multi-tenant service.
D — Disaggregated. Catalog process ↔ stateless query workers ↔ GPU-resident inference workers ↔ trigger broker, each scaling independently; the same binary in every role. Very high scale, specialized GPU pools, split compliance posture.

The five pluggable backends — the entire deployment-knob surface:

Backend	Embedded default	Production driver(s)
Catalog	SQLite	Postgres
Result-table storage	Local filesystem	S3 / GCS / R2 / Azure Blob (via the `object_store` crate)
Mutable companion tables	SQLite	Postgres
Trigger broker	In-memory	Kafka / NATS / Redis Streams / a cloud queue
Model artifact source	Local cache + HF Hub	Mirror, private registry, object-store-backed store

Everything else — load balancing, ingress, TLS, secrets, IAM, observability stack, orchestration, autoscaling — is the consumer’s runtime, not the engine’s.

Three properties this preserves, and that a consumer evaluating Jammi should be able to test:

The library is never less capable than the server. Anything the server does, the library does in-process. No feature gated to “clustered mode.”
The default deployment fits on a laptop. SQLite + local filesystem + in-memory trigger stream + HF Hub cache is a complete deployment, no cloud service required.
Production is a configuration change, not a fork. Moving from Shape A to Shape C swaps backend drivers; the engine code, schema, catalog discipline, and trigger-stream contract are unchanged.

Positioning

With these primitives the engine is, precisely: an embeddable AI engine — federated SQL, durable result-tables with ANN indexes, mutable companion tables, embeddings / inference / search / fine-tune / eval, evidence provenance on every row, and a trigger-stream event surface. That is a general-utility data-and-AI engine. It is deliberately not a substrate-platform engine — because the substrate-platform shape belongs to the consumers that are that shape, and the next consumer may not be.

Installation

Rust

Add Jammi to your Cargo.toml:

[dependencies]
jammi-db = "0.25"
jammi-ai = "0.25"
tokio = { version = "1", features = ["full"] }

CLI

The jammi CLI registers sources, runs SQL, and starts the server. There are three ways to get it.

`cargo install` (CPU)

Builds from source on your machine. Needs the build dependencies below.

cargo install jammi-cli

The installed binary is jammi.

Prebuilt binary (CPU)

Download a stripped, ready-to-run binary from the GitHub releases. No build toolchain required. Assets are published per release:

jammi-<version>-x86_64-unknown-linux-gnu.tar.gz — Linux x86-64 (built on a glibc 2.28 floor, so it runs on any newer Linux)
jammi-<version>-aarch64-apple-darwin.tar.gz — macOS on Apple silicon

tar -xzf jammi-0.25.0-x86_64-unknown-linux-gnu.tar.gz
./jammi --help

GPU (CUDA 12)

GPU inference ships as a container image, not a bare binary. The jammi-ai-server-cu12 image runs jammi-server as its entrypoint and also carries the jammi admin CLI; it is turnkey:

docker run --gpus all \
  -p 8080:8080 -p 8081:8081 \
  ghcr.io/f-inverse/jammi-ai-server-cu12:latest

That runs jammi-server with zero config. See Deploy as a Server for GPU configuration and persistence.

Alternatively, install the CUDA server as a pip wheel — it ships the same jammi-server binary and pulls the CUDA runtime from nvidia-*-cu12 wheels, so no system CUDA install is required (only an NVIDIA driver on the host):

pip install jammi-server-cu12
jammi-server

The jammi-ai embed wheel is CPU-only; GPU inference runs in the server, reached from Python via jammi.connect("grpc://…").

Build dependencies (Linux)

If building from source, you need a C compiler and protoc:

# Debian/Ubuntu
apt-get install protobuf-compiler gcc g++ pkg-config

# RHEL/AlmaLinux
yum install protobuf-compiler gcc gcc-c++ pkg-config

All other native libraries (lzma, zstd, zlib, sqlite) are vendored and compiled from source automatically. These tools are pre-installed in the devcontainer and CI images.

Python

pip install jammi-ai

Requires Python 3.8+. Pre-built wheels are available for Linux, macOS, and Windows.

From source

git clone https://github.com/f-inverse/jammi-ai.git
cd jammi-ai
cargo build --release

The CLI binary is at target/release/jammi (a strict gRPC client) and the server binary at target/release/jammi-server.

For the Python package from source:

pip install maturin
maturin develop --release

Runtime requirements

Jammi has no mandatory runtime dependencies beyond the binary itself.

Optional:

CUDA toolkit + cuDNN for GPU inference (CPU works out of the box)
HuggingFace Hub access for downloading models (first run downloads ~90MB for MiniLM, cached thereafter)
PostgreSQL / MySQL client libraries if using federated database sources

Set HF_TOKEN for gated models, or HF_HOME to control the cache location.

Quickstart: Rust

This walkthrough registers a local data file, runs a SQL query, generates embeddings, and performs a semantic search — all in one program.

Full example

extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use std::sync::Arc;
use jammi_ai::session::InferenceSession;
use jammi_db::config::JammiConfig;
use jammi_db::source::{FileFormat, SourceConnection, SourceType};
use jammi_db::store::CachePolicy;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let config = JammiConfig::load(None)?;
    let session = Arc::new(InferenceSession::new(config).await?);

    // 1. Register a data source
    session.add_source("patents", SourceType::File, SourceConnection {
        url: Some("file:///path/to/patents.parquet".into()),
        format: Some(FileFormat::Parquet),
        ..Default::default()
    }).await?;

    // 2. Query with SQL
    let rows = session.sql(
        "SELECT id, title, year FROM patents.public.patents WHERE year > 2020 LIMIT 5"
    ).await?;
    for batch in &rows {
        println!("{batch:?}");
    }

    // 3. Generate embeddings
    let (record, _outcome) = session.generate_text_embeddings(
        "patents",
        "sentence-transformers/all-MiniLM-L6-v2",
        &["title".to_string()],
        "id",
        CachePolicy::Bypass,
    ).await?;
    println!("Embedded {} rows", record.row_count);

    // 4. Semantic search
    let query = session.encode_text_query(
        "sentence-transformers/all-MiniLM-L6-v2",
        "quantum computing applications",
    ).await?;

    let results = session.search("patents", query, 5, None, None).await?
        .sort("similarity", true)?
        .run().await?;

    for batch in &results {
        println!("{batch:?}");
    }

    Ok(())
}

The first run downloads the model from HuggingFace Hub (~90MB). Subsequent runs load from cache.

What’s happening

JammiConfig::load(None) loads config from jammi.toml, $JAMMI_CONFIG, or defaults
InferenceSession wraps the query engine with model loading, caching, and GPU scheduling
add_source registers a file in the catalog — it survives session restarts
sql runs any SQL query via DataFusion, returns Vec<RecordBatch>
generate_text_embeddings runs the model over every row, persists vectors to Parquet with a sidecar ANN index
encode_text_query encodes a text string into the same vector space
search finds the nearest neighbors, hydrates all source columns, and returns results with similarity scores

Next steps

Query Your Data with SQL — SQL features, joins, aggregations
Generate Embeddings — persistence, multiple models, crash recovery
Semantic Search — SearchBuilder API, filtering, evidence provenance

Quickstart: Python

The full quickstart — install, connect, register, search — lives in the repo’s cookbook tree under cookbook/quickstart/ with a runnable quickstart.py that’s exercised end-to-end on every PR by tests/cookbook_smoke.py. This page mirrors the cookbook’s overview so the mdBook site renders a self-contained quickstart; the cookbook is the source of truth.

Goal: a fresh user goes from pip install jammi-ai to a successful vector query in five minutes. The end-to-end script lives next to this file in quickstart.py — copy-paste it, run it, then read the four step-by-step pages for the explanation.

Steps

Install — pip install jammi-ai
Connect — open a session against a local artifact dir
Register a source — attach a Parquet file
Generate embeddings + search — build a vector index and run a similarity query

Run it

python cookbook/quickstart/quickstart.py

Expected output: a header row and three top-3 matches with cosine similarity scores. The script exits 0 in under 30 seconds on CPU.

Production substitution

The script uses the local cookbook/fixtures/tiny_bert/ model (32-dim, 88 KB, single-layer) so the example needs no network access. In a real workload you would swap in a Hub model — for example sentence-transformers/all-MiniLM-L6-v2 (384-dim, English) — by changing the MODEL constant. Everything else stays the same.

Quickstart: CLI

The jammi CLI is a strict gRPC client: it talks to a running jammi-server over the wire and never touches the catalog or storage in-process. Start a server (see Deploy as a Server), then point the CLI at it with --target.

Register a source and query it

# Register a remote source (the URL is resolved server-side)
jammi --target grpc://127.0.0.1:8081 \
  sources add patents --url /path/to/patents.parquet --format parquet

# List registered sources
jammi --target grpc://127.0.0.1:8081 sources list

# Run a SQL query
jammi --target grpc://127.0.0.1:8081 \
  query "SELECT id, title, year FROM patents.public.patents WHERE year > 2020 LIMIT 5"

# Show the execution plan
jammi --target grpc://127.0.0.1:8081 \
  explain "SELECT * FROM patents.public.patents WHERE year > 2020"

The default --target is grpc://127.0.0.1:8081, so a CLI talking to a local server can omit the flag.

Check the server

# Report version, compiled features, storage backends, and mounted services.
# A successful response also confirms reachability.
jammi status

Available commands

Command	Description
`jammi status`	Report the server’s capabilities and confirm reachability
`jammi sources list`	List registered data sources
`jammi sources add <NAME> --url <URL> --format <FMT>`	Register a source
`jammi models list`	List registered models
`jammi query "<SQL>"`	Run a SQL query and print results
`jammi explain "<SQL>"`	Show the execution plan for a query
`jammi channels …`	Manage evidence channels
`jammi mutable …`	Manage mutable companion tables
`jammi trigger …`	Manage trigger-stream topics

Global options

jammi --target <ENDPOINT> <command>   # Server endpoint (default grpc://127.0.0.1:8081)
jammi --tenant <UUID> <command>       # Bind a tenant scope for the session

--target accepts grpc://host:port (plaintext), grpcs://host:port (TLS), http(s)://host:port, or a bare host:port. --tenant binds a tenant scope before any verb runs, so every read and write is scoped to that tenant.

Next steps

Deploy as a Server — jammi-server, configuration, preloading models
Configuration — full config reference

Runnable Recipes

Every recipe under cookbook/recipes/ ships as a runnable example.py next to a markdown README and is wired into CI via tests/cookbook_smoke.py — a broken recipe blocks the merge. These recipes are the OSS source of truth; this page mirrors each README below.

For the long-form, measured companion — The Cookbook, a Theory↔Computation book that shows one recipe = one equation in the graph-signal-processing monograph = one line of the GNN canon, executed against committed goldens — see The Cookbook. The two are complementary: these How-To Guides are the short, dual-language (Rust + Python), compile-tested “how do I call this verb” reference; The Cookbook is the long-form, Python, executed-and-measured narrative.

The recipes shipped at MVP:

Recipe	Demonstrates
`mutable_tables`	Create/insert/select/drop on a mutable companion table
`trigger_streams`	Publish + subscribe on a topic via the in-process broker
`eval_embeddings`	recall@k, MRR, nDCG against a golden set
`image_search`	Image-to-image search with PatentCLIP + Recall@K / MRR eval
`eval_inference`	Accuracy + macro F1 against gold labels
`eval_inference_ner`	Entity-level precision / recall / F1 against gold spans
`fine_tune`	LoRA fine-tune end-to-end
`flight_sql`	Query a remote `jammi-server` over Arrow Flight SQL
`audio_search`	Audio-to-audio search with a CLAP encoder
`search_audit`	Per-query provenance audit of a search
`session_lifecycle`	Ephemeral session storage with scoped cleanup

Mutable tables

End-to-end create / insert / select / drop on a Jammi mutable table — the OSS primitive for state that needs to live alongside read-only result tables.

When to use this pattern. You need a writable table that sits in the same SQL catalog as your registered sources and embedding tables — for caching enriched rows, holding cursor state, recording user feedback, or any “small table I want to UPDATE / DELETE / INSERT from SQL” workload — without standing up an external Postgres.

What `example.py` does

Connects to a temporary artifact dir
Creates a notes mutable table with an int64 primary key + utf8 body column
Inserts three rows through DataFusion DML (INSERT INTO ...)
Verifies count and ordering via SELECT
Drops the table, then asserts a SELECT after the drop raises
Demonstrates the idempotent drop_mutable_table(..., if_exists=True)

API surface exercised

Database.create_mutable_table(name, *, schema, primary_key, ...)
Database.sql("INSERT INTO mutable.public.<name> ...")
Database.sql("SELECT ... FROM mutable.public.<name>")
Database.drop_mutable_table(name, *, if_exists=False)

The DataFusion namespace for mutable tables is always mutable.public.<name> — distinct from registered sources, which live under <source>.public.<source>.

Run it

python cookbook/recipes/mutable_tables/example.py

Exits 0 on success, prints mutable_tables: OK on the last line.

Trigger streams

End-to-end publish + subscribe on a Jammi topic, plus the registration and listing surface. Uses the embedded in-process broker — no NATS or external broker needed.

When to use this pattern. You need a low-friction event bus inside your application — for fan-out to downstream consumers, fan-in from batch jobs, or replay-from-offset semantics — without bringing up Kafka or NATS in dev/test. The same surface scales out to NATS JetStream by flipping a config flag at deploy time.

What `example.py` does

Connects to a temporary artifact dir
Registers a topic events.demo with a typed schema and broker metadata
Confirms list_topics() returns the new topic
Publishes a 3-row batch through publish_topic — captures the broker-assigned offset
Subscribes from from_offset=0 and round-trips the same rows back
Drops the topic, confirms it’s gone from list_topics()
Demonstrates idempotent drop_topic(..., if_exists=True) and strict-mode failure when dropping a missing topic

API surface exercised

Database.register_topic(name, *, schema, broker_metadata=None)
Database.list_topics()
Database.publish_topic(name, *, batch) — returns the assigned offset
Database.subscribe_collect(name, *, from_offset, max_batches)
Database.drop_topic(name, *, if_exists=False)

The subscribe_collect path drives the replay-from-backing-table flow when from_offset=0; the live-tail flow is exercised in the broker integration suite.

Run it

python cookbook/recipes/trigger_streams/example.py

Exits 0 on success, prints trigger_streams: OK on the last line.

Evaluate retrieval quality

Measure recall@k, precision@k, MRR, and nDCG of an embedding index against a golden relevance set.

When to use this pattern. You have a corpus and a small set of (query, expected document) judgments, and you need a number that tells you “is my new encoder better than the one I shipped last month?” The same loop powers nightly regression dashboards and A/B model comparison.

What `example.py` does

Connects to a temporary artifact dir
Registers the tiny corpus as a Parquet source
Builds 32-dim embeddings over the content column with the local tiny_bert fixture
Reads cookbook/fixtures/tiny_golden.json, expands it into the (query_id, query_text, relevant_id) CSV shape eval_embeddings consumes, and registers it as a golden source
Calls db.eval_embeddings(source="corpus", golden_source="golden.public.golden", k=5)
Asserts each aggregate metric is in [0.0, 1.0] and the per-query records carry their golden-set query_id

API surface exercised

Database.generate_embeddings(*, source, model, columns, key, modality="text")
Database.eval_embeddings(*, source, golden_source, model=None, k=10)

The returned dict carries aggregate (mean across queries — recall_at_k, precision_at_k, mrr, ndcg) and per_query (one entry per query with query_id and a metrics sub-dict of the same four names, un-averaged).

Golden source shape

eval_embeddings requires a registered source with these columns:

column	type	example
`query_id`	utf8	`q1`
`query_text`	utf8	`quantum computing applications`
`relevant_id`	utf8	`1` (matches `corpus.id` as a string)

Image queries are supported via a query_image BLOB column instead of query_text; cross-modal eval is out of scope for this recipe.

Run it

python cookbook/recipes/eval_embeddings/example.py

Exits 0 on success, prints the metrics dict + eval_embeddings: OK.

Image search

Run image-to-image semantic search over a corpus with an OpenCLIP-format vision model, then measure retrieval quality.

When to use this pattern. You have a corpus of images (figures, drawings, photos) and want to find the ones most similar to a query image — and a number that tells you how good the retrieval is. This is the image counterpart of the text eval_embeddings recipe.

Flow

Load a small image corpus (inline image bytes in a Parquet source)
Generate L2-normalized vision embeddings over the image column
Search the index with an encoded image query (cosine ANN)
Eval retrieval quality (Recall@K / MRR) against a held-out golden set

Model

The example uses PatentCLIP as the reference model — it is the federal patent-figure-search use case driving this recipe:

JAMMI_IMAGE_MODEL=patentclip/PatentCLIP_Vit_B \
    python cookbook/recipes/image_search/example.py

patentclip/PatentCLIP_Vit_B is pulled from the Hugging Face Hub on first use and produces 512-dim L2-normalized embeddings. Any OpenCLIP-format model works the same way — OpenAI CLIP, LAION CLIP-ViT-B-32-*, EVA-CLIP, etc. — the encoder is auto-detected from the model’s open_clip_config.json.

By default (no env var) the recipe runs against the hermetic cookbook/fixtures/tiny_open_clip fixture so it runs offline in CI in under a few seconds. That fixture has random weights, so its retrieval numbers are meaningless — it exercises the full pipeline, not model quality. Use PatentCLIP (or any real model) for real numbers.

What `example.py` does

Connects to a temporary artifact dir
Reads the 20 committed 224×224 PNGs under cookbook/fixtures/tiny_image_corpus/ into a Parquet corpus source (image_id, image bytes)
db.generate_embeddings(source="corpus", model=MODEL, columns=["image"], key="image_id", modality="image")
db.encode_query(model=MODEL, query=png_bytes, modality="image") → db.search("corpus", query=vec, k=5) (returns a pyarrow.Table)
Builds the image-query golden source from tiny_image_golden.json and calls db.eval_embeddings(source="corpus", golden_source="golden.public.golden", k=5)
Prints the aggregate Recall@K / precision@K / MRR / nDCG and the per-query records. It reports the metrics; it does not assert a quality bar.

Stepwise scripts

example.py runs all four phases in one process (this is the version wired into tests/cookbook_smoke.py). The numbered scripts decompose the same flow and share a persistent workdir, so run them in order:

python cookbook/recipes/image_search/01-load-corpus.py
python cookbook/recipes/image_search/02-generate-embeddings.py
python cookbook/recipes/image_search/03-search.py
python cookbook/recipes/image_search/04-eval.py

API surface exercised

Database.generate_embeddings(*, source, model, columns, key, modality="image")
Database.encode_query(*, model, query, modality="image") → list[float]
Database.search(source, *, query, k, filter=None, select=None) → pyarrow.Table
Database.eval_embeddings(*, source, golden_source, model=None, k=10)

Input schema

column	type	notes
`image_id`	utf8	per-row key
`image`	binary	raw PNG/JPEG/TIFF bytes (decoded by the encoder)

Preprocessing (pad-to-square, no center crop, normalization, L2-normalized output) is handled inside the encoder per the model’s preprocess_cfg.

Golden source shape (image mode)

eval_embeddings switches to image-query mode when the golden source carries a query_image (binary) column instead of query_text:

column	type	example
`query_id`	utf8	`q_circle`
`query_image`	binary	raw PNG bytes of the query image
`relevant_id`	utf8	`img_circle_0` (matches `image_id`)

Fixtures

cookbook/fixtures/tiny_image_corpus/ — 20 synthetic 224×224 PNGs in 5 shape families (circle / triangle / square / hexagon / grating), 4 per family, plus a held-out query image per family under queries/. Rendered programmatically by cookbook/fixtures/generate.py — no real patent imagery (licensing).
cookbook/fixtures/tiny_image_golden.json — per-query → expected corpus IDs (same shape family).
cookbook/fixtures/tiny_open_clip/ — tiny offline OpenCLIP fixture used as the default CI model.

Run it

python cookbook/recipes/image_search/example.py

Exits 0 on success, prints the top-K and the metrics dict + image_search: OK.

Evaluate inference (classification)

Run a classifier over a registered source and score its predictions against gold labels.

When to use this pattern. You have a labelled holdout set and you want a single number — accuracy, macro F1, per-class F1 — to compare two classifiers, or to track drift over time on the same classifier.

What `example.py` does

Connects to a temporary artifact dir
Registers the tiny corpus as corpus (parquet)
Registers tiny_labels.csv as golden (csv) — (id, label) rows
Runs db.eval_inference with the local tiny_modernbert_classifier fixture against the content column
Prints the returned aggregate accuracy, macro f1, per-class metrics, and the count of per-record predictions
Asserts every reported rate is in [0.0, 1.0]

API surface exercised

Database.eval_inference(*, model, source, columns, task, golden_source, label_column)

The returned dict carries aggregate (tagged by "task" — currently "classification") with accuracy, f1, and per_class, plus per_record (one entry per aligned {record_id, predicted, gold}).

The task argument is the string form of the inference task — "classification" here. For NER, see ../eval_inference_ner/.

Golden source shape

eval_inference requires a registered source with these columns:

column	type	example
`id`	utf8	`"1"`
`<label_column>`	utf8	`physics`

label_column is the kwarg you pass at call time — label in this recipe. Every id in the golden source must resolve to a row in the input source; rows without a gold label are silently dropped from the metric.

Run it

python cookbook/recipes/eval_inference/example.py

Exits 0 on success, prints the metrics dict + eval_inference: OK.

Evaluate inference (NER)

Run a token-classification model over a registered source and score its predicted entity spans against gold spans.

When to use this pattern. You have a labelled NER holdout set (one gold span per row) and you want strict entity-level precision, recall, and F1 — both overall and per entity type — to compare two NER models or to track regressions on the same one.

What `example.py` does

Connects to a temporary artifact dir
Registers tiny_ner_corpus.parquet as corpus (parquet)
Registers tiny_ner_gold.csv as golden (csv) — one row per gold entity span: (id, label, start, end)
Runs db.eval_inference with the local tiny_modernbert_ner fixture against the text column, task="ner"
Prints the returned aggregate precision, recall, f1, the per-type breakdown, and the count of per-record predictions
Asserts every reported rate is in [0.0, 1.0]

API surface exercised

Database.eval_inference(*, model, source, columns, task, golden_source, label_column)

The returned dict carries aggregate (tagged by "task" — "ner" for this recipe) with precision, recall, f1, and per_type (one breakdown per entity type the model emitted or the gold set carried), plus per_record (one entry per aligned {record_id, predicted, gold} where predicted and gold are entity-span lists, each tagged "task": "ner").

The task argument is the string form of the inference task — "ner" here. For classification, see ../eval_inference/.

Golden source shape

eval_inference with task="ner" requires a registered source with these columns — one row per entity span (multiple spans on the same id accumulate into one per-row gold set):

column	type	example
`id`	utf8	`"1"`
`<label_column>`	utf8	`PER`
`start`	i64	`0`
`end`	i64	`13`

label_column is the kwarg you pass at call time — label in this recipe. start is inclusive, end is exclusive, both byte offsets into the source row’s text column. The label set must match the shipped model’s id2label minus the B-/I- prefixes — tiny_modernbert_ner knows PER and ORG only.

Rows in the source without a matching gold id are silently dropped from the metric (same alignment rule the classification recipe uses).

Run it

python cookbook/recipes/eval_inference_ner/example.py

Exits 0 on success, prints the metrics dict + eval_inference (ner): OK.

Fine-tune an encoder

Run a LoRA fine-tune on top of an existing text encoder, poll the job to completion, and use the resulting checkpoint to encode a query.

When to use this pattern. Your domain (legal contracts, medical abstracts, patent claims, internal product docs) doesn’t match the distribution the base encoder was trained on, and you have a few hundred to a few thousand labelled or contrastive pairs. LoRA gets you ~80% of the lift of a full fine-tune at a fraction of the cost; the resulting adapter is small enough to ship as an attachment to the base model rather than a re-distributed full checkpoint.

What `example.py` does

Connects to a temporary artifact dir
Registers tiny_pairs.csv (30 contrastive pairs) as training
Calls db.fine_tune(...) with the local tiny_bert base, a small LoRA rank, and one epoch — kept fast for CI
Waits for terminal status via job.wait()
Asserts the resulting model_id starts with jammi:fine-tuned:
Encodes a query through the fine-tuned model to confirm it loads

API surface exercised

Database.fine_tune(*, source, base_model, columns, method, task=..., ...)
TrainingJob.wait()
TrainingJob.job_id, TrainingJob.model_id
Database.encode_query(*, model, query, modality="text")

The full keyword list on fine_tune covers LoRA rank/alpha/dropout, learning rate, epochs, batch size, max sequence length, validation fraction, early-stopping patience/metric, warmup, gradient accumulation, backbone dtype, weight decay, and gradient clipping — the recipe uses the defaults for everything except rank and epochs.

Performance note

This recipe is excluded from the per-PR smoke matrix because even at one epoch it runs ~30 seconds on CPU. The nightly cron with JAMMI_COOKBOOK_SLOW=1 includes it. Override the gate locally:

JAMMI_COOKBOOK_SLOW=1 python tests/cookbook_smoke.py

Run it

python cookbook/recipes/fine_tune/example.py

Exits 0 on success, prints job_id, model_id, and fine_tune: OK.

Connect via Flight SQL

Run a query against a remote jammi-server over Arrow Flight SQL.

When to use this pattern. You’re connecting from a non-Python client (Tableau, dbt, JDBC tools, Rust binaries), or you want to expose Jammi to multiple readers without each one holding an embedded session. The same protocol is what dbt-flightsql, the official Flight SQL JDBC driver, and BI tools speak natively.

What `example.py` does

Spawns target/release/jammi-server as a child process pointed at a temp artifact_dir
Polls the health endpoint (http://127.0.0.1:8080/healthz) until the server is ready (5 s budget)
Opens a pyarrow.flight.FlightClient against grpc://127.0.0.1:8081
Submits SELECT 1 AS one over Flight SQL and confirms the response
Tears down the server process cleanly

This recipe is gated out of the per-PR CI matrix — it depends on the jammi-server binary being built (cargo build --release -p jammi-server), and the build cost dominates the test wall-clock. The nightly cookbook job builds the binary and runs the recipe behind JAMMI_COOKBOOK_SLOW=1.

Prerequisites

cargo build --release -p jammi-server — produces target/release/jammi-server
pip install pyarrow (already a jammi-ai dependency)

The script auto-detects JAMMI_BIN (env var) or falls back to the workspace’s target/release/jammi-server.

API surface exercised

pyarrow.flight.FlightClient.execute(query) over the Flight SQL command dialect
jammi-server — the OSS deployment-shape binary entrypoint

Run it

cargo build --release -p jammi-server      # one-time build
python cookbook/recipes/flight_sql/example.py

Exits 0 on success, prints the query result + flight_sql: OK.

Audio search

Run audio-to-audio similarity search over a corpus with a CLAP-format audio model, measure retrieval quality, and domain-tune the audio embeddings on caller-supplied triplets.

When to use this pattern. You have a corpus of sounds (clips, stems, loops, recordings) and want to find the ones most similar to a query clip — and a number that tells you how good the retrieval is. This is the audio counterpart of the image eval_embeddings recipe; audio is simply the third embedding modality the engine supports alongside text and images.

Flow

Load a small audio corpus (inline audio bytes in a Parquet source)
Generate L2-normalized audio embeddings over the audio column
Search the index with an encoded audio query (cosine ANN)
Eval retrieval quality (Recall@K / MRR) against a held-out golden set
Fine-tune a projection head on audio triplets and re-eval (tuned ≠ base)

Model

Any HuggingFace CLAP audio model works — its config.json declares model_type = "clap_audio_model" (or lists ClapModel / ClapAudioModelWithProjection in architectures), its checkpoint exposes the audio_model.audio_encoder.* + audio_projection.* HTSAT-Swin tower keys, and a preprocessor_config.json carries the feature-extractor geometry. The encoder is auto-detected from that config, exactly as the image recipe auto-detects OpenCLIP:

JAMMI_AUDIO_MODEL=<hf-repo-id-or-local-path> \
    python cookbook/recipes/audio_search/example.py

By default (no env var) the recipe runs against the hermetic cookbook/fixtures/htsat_clap_tiny fixture so it runs offline in CI in under a few seconds. That fixture has random weights, so its retrieval numbers are meaningless — it exercises the full pipeline, not model quality. Point JAMMI_AUDIO_MODEL at a real CLAP checkpoint for real numbers.

What `example.py` does

Connects to a temporary artifact dir
Reads the 20 committed mono WAV clips under cookbook/fixtures/tiny_audio_corpus/ into a Parquet corpus source (clip_id, audio bytes)
db.generate_embeddings(source="corpus", model=MODEL, columns=["audio"], key="clip_id", modality="audio")
db.encode_query(model=MODEL, query=wav_bytes, modality="audio") → db.search("corpus", query=vec, k=5) (returns a pyarrow.Table)
Builds the audio-query golden source from tiny_audio_golden.json and calls db.eval_embeddings(source="corpus", golden_source="golden.public.golden", k=5)
Prints the base aggregate Recall@K / precision@K / MRR / nDCG and the per-query records. It reports the metrics; it does not assert a quality bar.
Builds synthetic (anchor, positive, negative) audio triplets from the corpus (positive = same timbre family, negative = a different family) and calls db.fine_tune(source="triplets", base_model=MODEL, columns=["anchor","positive","negative"], method="lora", task="audio_embedding", ...). Empty target_modules ⇒ a trainable projection head on the frozen CLAP audio tower (the cheap, low-risk lightweight mode). It then re-embeds the corpus with the tuned model, re-evals, and prints base-vs-tuned metrics for narrative. For correctness it re-encodes the same query clip through the tuned model and asserts the embedding vector changed (max elementwise |Δ| > 1e-4 versus the base encoding) — the real invariant fine-tuning guarantees, and a deterministic check. (Asserting on the coarse top-k metrics instead is flaky: on this tiny eval set the rankings rarely flip even when the vectors move.) It proves the adapter alters audio retrieval — not that it improves it; the random-weight fixture’s direction is not meaningful, real lift comes from a real checkpoint.

The pairing semantics (what a “positive” means) are the caller’s training data, not the trainer’s: the trainer only minimizes the contrastive triplet loss over whatever clips you pair.

Stepwise scripts

example.py runs every phase in one process (this is the version wired into tests/cookbook_smoke.py). The numbered scripts decompose the search-and-eval flow and share a persistent workdir, so run them in order:

python cookbook/recipes/audio_search/01-load-corpus.py
python cookbook/recipes/audio_search/02-generate-embeddings.py
python cookbook/recipes/audio_search/03-search.py
python cookbook/recipes/audio_search/04-eval.py

API surface exercised

Database.generate_embeddings(*, source, model, columns, key, modality="audio")
Database.encode_query(*, model, query, modality="audio") → list[float]
Database.search(source, *, query, k, filter=None, select=None) → pyarrow.Table
Database.eval_embeddings(*, source, golden_source, model=None, k=10)
Database.fine_tune(*, source, base_model, columns, method, task="audio_embedding", ...) → TrainingJob

Audio triplet schema (fine-tune input)

column	type	notes
`anchor`	binary	encoded audio clip
`positive`	binary	a clip the caller deems related
`negative`	binary	a clip the caller deems unrelated

Same column shape as text triplets — task="audio_embedding" is what tells the loader to read the three columns as encoded audio rather than text.

Input schema

column	type	notes
`clip_id`	utf8	per-row key
`audio`	binary	raw WAV/FLAC/MP3/Ogg bytes (decoded by the encoder)

Preprocessing (decode → resample to the model’s sample rate → CLAP fusion log-mel spectrogram → HTSAT-Swin tower → L2-normalized output) is handled inside the encoder per the model’s preprocessor_config.json feature-extractor geometry. The audio column may also hold file-path strings instead of inline bytes.

Golden source shape (audio mode)

eval_embeddings switches to audio-query mode when the golden source carries a query_audio (binary) column instead of query_text / query_image:

column	type	example
`query_id`	utf8	`q_sine`
`query_audio`	binary	raw WAV bytes of the query clip
`relevant_id`	utf8	`clip_sine_0` (matches `clip_id`)

Fixtures

cookbook/fixtures/tiny_audio_corpus/ — 20 synthetic mono WAV clips in 5 timbre families (sine / harmonic / square / saw / noise), 4 per family, plus a held-out query clip per family under queries/. Synthesised programmatically by cookbook/fixtures/generate.py — no recorded audio (licensing), no tenant data.
cookbook/fixtures/tiny_audio_golden.json — per-query → expected corpus IDs (same timbre family).
cookbook/fixtures/htsat_clap_tiny/ — tiny offline HTSAT-Swin CLAP fixture used as the default CI model, generated by tests/fixtures/generate_htsat_clap.py.

Run it

python cookbook/recipes/audio_search/example.py

Exits 0 on success, prints the top-K and the metrics dict + audio_search: OK.

Per-query search audit

Record a tamper-evident audit row for every search: what was queried, with what model, what came back, and when. The substrate signs each record, stores it tenant-scoped, and publishes it to a trigger topic — so you do not hand-roll an audit schema, a signature scheme, and a stream integration in every project.

This is the primitive every audited-ML deployment (financial, healthcare, federal, legal) needs to answer “show me exactly what this model returned for this query, and prove the record hasn’t been altered.”

What this recipe shows

Build a PerQueryAudit record (query id, model id/version, query lineage, top-K result ids, retrieval scores).
db.audit.log([...]) — the substrate injects tenant_id, signs the record with a per-tenant HMAC-SHA256 key, stores it, and publishes it.
db.audit.fetch_by_query_id(...) / db.audit.fetch_recent(...) — typed reads, tenant-scoped.
record.verify() — re-derive the key and check the signature.
Plain SQL over mutable.public."_jammi_search_audit" — same tenant scope.
db.subscribe_collect("jammi.audit.search.v1", ...) — every logged record is also delivered on a trigger topic for alerting / analytics / warehouse sinks.

Run it

The audit master key is required — the substrate refuses to sign without it:

export JAMMI_AUDIT_MASTER_KEY=$(python -c "import secrets; print(secrets.token_hex(32))")
python cookbook/recipes/search_audit/example.py

The key derives a distinct signing secret per tenant via HKDF-SHA256 and is deterministic across restarts, so signatures written today verify after a redeploy. Source it from your secret manager — never hard-code it.

Key points

Lineage is capped. query_lineage JSON may not exceed 8 KiB (override with JAMMI_AUDIT_MAX_LINEAGE_BYTES). Store image hashes and row IDs, not raw payloads — compliance posture is structural, not advisory.
top_k_result_ids and retrieval_scores must be the same length. This is checked when you construct the record.
The table is reserved. _jammi_search_audit is created implicitly on the first log; you cannot create or directly INSERT into it (that would bypass signing). Read it freely via SQL.
Tenant isolation is automatic. A record logged under tenant A is invisible to tenant B, through both the typed API and raw SQL.

Ephemeral session storage

A session-scoped storage context whose tables are auto-deleted when the session ends — on explicit close(), on context-manager exit, or when the 60-second timeout scanner force-closes a session past its deadline. Every transition publishes to the jammi.audit.session_lifecycle.v1 trigger topic, giving an audit-log aggregator durable proof that the data was deleted.

Run it:

python cookbook/recipes/session_lifecycle/example.py

When to use it

Use an ephemeral session for sensitive transient data that must not outlive the request that produced it: uploaded images, derived embeddings, draft model inputs. The session is always tenant-scoped — tenant A can never see tenant B’s ephemeral tables.

When NOT to use it

Do not store long-lived data in an ephemeral session. The audit record, the persistent corpus, and anything compliance needs to read later belong in ordinary mutable tables. The pattern is: keep the throwaway working set (raw bytes, embeddings) in the ephemeral session, and write only durable lineage (hashes, ids, scores) to a persistent table — before you close the session, while the working data still exists.

API

with db.ephemeral_session(timeout_seconds=3600) as ephem:
    ephem.create_ephemeral_table("imgs", schema=schema, primary_key=["image_id"])
    ephem.insert("imgs", batch=table)
    rows = ephem.sql("imgs", "SELECT image_hash FROM {table}")
# close() runs on exit: tables dropped, `closed` event published

{table} in a sql query is replaced by the tenant-scoped reference to the named ephemeral table. The context manager is the recommended path; Drop is best-effort. Lifecycle events (opened, closed, timed_out, partial_deletion_failure) carry the session id, tenant, table count, and deleted-row count.

Query Your Data with SQL

Register data files as named sources, then query them with full SQL. Sources are persisted in the catalog and survive session restarts.

Register a source

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use jammi_ai::session::InferenceSession;
async fn ex(session: &InferenceSession) -> jammi_db::error::Result<()> {
use jammi_db::source::{FileFormat, SourceConnection, SourceType};

session.add_source("patents", SourceType::File, SourceConnection {
    url: Some("file:///data/patents.parquet".into()),
    format: Some(FileFormat::Parquet),
    ..Default::default()
}).await?;
Ok(()) }
}

Python

db.add_source("patents", path="/data/patents.parquet", format="parquet")

CLI

jammi sources add patents --path /data/patents.parquet --format parquet

Supported formats

Format	Rust	Python/CLI	Notes
Parquet	`FileFormat::Parquet`	`"parquet"`	Columnar, compressed, recommended for large datasets
CSV	`FileFormat::Csv`	`"csv"`	Auto-detected schema
JSON	`FileFormat::Json`	`"json"`	Line-delimited JSON

Run a SQL query

Sources are accessible via three-part SQL names: <source_id>.public.<table_name>. The table name is derived from the file name (e.g., patents.parquet becomes patents).

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use jammi_ai::session::InferenceSession;
async fn ex(session: &InferenceSession) -> jammi_db::error::Result<()> {
let results = session.sql(
    "SELECT id, title, year FROM patents.public.patents WHERE year > 2020 ORDER BY year"
).await?;

for batch in &results {
    println!("{batch:?}");
}
Ok(()) }
}

Python

table = db.sql("SELECT id, title, year FROM patents.public.patents WHERE year > 2020 ORDER BY year")
print(table.to_pandas())

CLI

jammi query "SELECT id, title, year FROM patents.public.patents WHERE year > 2020 ORDER BY year"

Aggregations

SELECT category, COUNT(*) as count, AVG(citation_count) as avg_citations
FROM patents.public.patents
WHERE year > 2020
GROUP BY category
ORDER BY count DESC

Joins across sources

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use jammi_ai::session::InferenceSession;
use jammi_db::source::{FileFormat, SourceConnection, SourceType};
async fn ex(session: &InferenceSession) -> jammi_db::error::Result<()> {
session.add_source("companies", SourceType::File, SourceConnection {
    url: Some("file:///data/companies.csv".into()),
    format: Some(FileFormat::Csv),
    ..Default::default()
}).await?;

let results = session.sql("
    SELECT p.title, c.company_name
    FROM patents.public.patents p
    JOIN companies.public.companies c ON p.assignee_id = c.id
").await?;
Ok(()) }
}

Python

db.add_source("companies", path="/data/companies.csv", format="csv")

table = db.sql("""
    SELECT p.title, c.company_name
    FROM patents.public.patents p
    JOIN companies.public.companies c ON p.assignee_id = c.id
""")

Source lifecycle

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use jammi_ai::session::InferenceSession;
async fn ex(session: &InferenceSession) -> jammi_db::error::Result<()> {
// List registered sources
let sources = session.catalog().list_sources().await?;

// Remove a source
session.remove_source("patents").await?;
Ok(()) }
}

CLI

jammi sources list

Sources persist in the SQLite catalog at <artifact_dir>/catalog.db. Registering the same source ID twice returns an error — remove it first.

Execution plans

Use EXPLAIN (or the CLI explain command) to see how DataFusion will execute your query:

jammi explain "SELECT * FROM patents.public.patents WHERE year > 2020"

Generate Embeddings

Measured companion: for the long-form, executed-and-measured Python treatment, see The Cookbook → Constructing the Graph.

Generate vector embeddings by running a model over text columns from a registered source. Results are persisted to Parquet with sidecar ANN indexes for fast similarity search.

Basic usage

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use jammi_ai::session::InferenceSession;
use jammi_db::store::CachePolicy;
async fn ex(session: &InferenceSession) -> jammi_db::error::Result<()> {
let (record, _outcome) = session.generate_text_embeddings(
    "patents",
    "sentence-transformers/all-MiniLM-L6-v2",
    &["abstract".to_string()],
    "id",
    CachePolicy::Bypass,
).await?;

println!("Embedded {} rows, {} dimensions", record.row_count, record.dimensions.unwrap());
Ok(()) }
}

Python

db.generate_embeddings(
    source="patents",
    model="sentence-transformers/all-MiniLM-L6-v2",
    columns=["abstract"],
    key="id",
    modality="text",
)

What gets created

Each call creates a timestamped Parquet file plus a sidecar ANN index bundle:

{artifact_dir}/jammi_db/
├── patents__embedding__all-MiniLM-L6-v2__20260325T120000.parquet
├── patents__embedding__all-MiniLM-L6-v2__20260325T120000.usearch
├── patents__embedding__all-MiniLM-L6-v2__20260325T120000.rowmap
└── patents__embedding__all-MiniLM-L6-v2__20260325T120000.manifest.json

Parquet file — source of truth. Contains _row_id, _source_id, _model_id, vector. Readable by external tools (DuckDB, Polars, pandas).
.usearch — USearch HNSW graph for ANN search.
.rowmap — maps internal USearch keys to _row_id strings.
.manifest.json — metadata (dimensions, count, metric, backend).

The sidecar files are disposable — deleting them falls back to brute-force exact search. The Parquet file is the only thing that matters.

Embedding table schema

Column	Type	Description
`_row_id`	Utf8	Key column value cast to string
`_source_id`	Utf8	Source identifier
`_model_id`	Utf8	Model identifier
`vector`	FixedSizeList(Float32, N)	L2-normalized embedding vector

Failed rows (null or empty text) are excluded — only successfully embedded rows appear in the output.

Multiple text columns

Pass multiple column names to concatenate them (space-separated) before embedding:

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use jammi_ai::session::InferenceSession;
use jammi_db::store::CachePolicy;
async fn ex(session: &InferenceSession) -> jammi_db::error::Result<()> {
session.generate_text_embeddings(
    "papers",
    "sentence-transformers/all-MiniLM-L6-v2",
    &["title".to_string(), "abstract".to_string()],
    "doi",
    CachePolicy::Bypass,
).await?;
Ok(()) }
}

Python

db.generate_embeddings(
    source="papers",
    model="sentence-transformers/all-MiniLM-L6-v2",
    columns=["title", "abstract"],
    key="doi",
    modality="text",
)

Multiple embedding tables

Each call creates a new table. Multiple tables can coexist for the same source (different models, different columns):

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use jammi_ai::session::InferenceSession;
use jammi_db::store::CachePolicy;
async fn ex(session: &InferenceSession) -> jammi_db::error::Result<()> {
session.generate_text_embeddings("patents", "all-MiniLM-L6-v2", &["abstract".into()], "id", CachePolicy::Bypass).await?;
session.generate_text_embeddings("patents", "bge-small-en-v1.5", &["title".into()], "id", CachePolicy::Bypass).await?;
Ok(()) }
}

When searching, the latest ready embedding table is used by default.

Import precomputed embeddings

When the vectors already exist — computed by an offline batch, migrated from another store, or upserted from a remote encoder — register them directly as a ready embedding table instead of re-running the model. The input is a Parquet object with a _row_id (Utf8) column and a vector (FixedSizeList<Float32> of width dimensions) column, one row per key.

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use jammi_ai::session::InferenceSession;
async fn ex(session: &InferenceSession) -> jammi_db::error::Result<()> {
use jammi_db::storage::StorageUrl;

let vectors = StorageUrl::parse("file:///data/precomputed.parquet")?;
let record = session.import_embeddings(
    "patents",
    "sentence-transformers/all-MiniLM-L6-v2",
    &vectors,
    "id",
    &["abstract".to_string()],
    384,
).await?;

println!("Imported {} rows, {} dimensions", record.row_count, record.dimensions.unwrap());
Ok(()) }
}

Python

db.import_embeddings(
    source="patents",
    model="sentence-transformers/all-MiniLM-L6-v2",
    vectors_url="file:///data/precomputed.parquet",
    key="id",
    text_columns=["abstract"],
    dimensions=384,
)

The result is indistinguishable from a generated table — same (_row_id, _source_id, _model_id, vector) schema, same sidecar ANN index — so search queries it exactly like any other embedding table. Three behaviours are specific to import:

Vectors are L2-normalized on import. Every embedding table holds unit vectors (the cosine ANN sidecar assumes it), so each incoming vector is normalized and a zero-norm vector is rejected — it cannot be cosine-searched.
The model is validated, not loaded. model is parsed to its canonical form and recorded as the table’s derivation provenance; import never loads the encoder or downloads weights, so it needs no GPU. key and text_columns are recorded as catalog provenance (which source column the keys came from, which content columns produced the vectors); the physical key stays _row_id.
The table is recompute-inert. The engine did not compute these vectors, so a recompute of an imported table is a typed refusal rather than a re-run guessed from its columns.

The input vectors are read fully into memory; a streaming variant is future work.

Supported models

Any encoder model on HuggingFace Hub with safetensors weights. Supported architectures:

BERT family — BERT, RoBERTa, DistilBERT, CamemBERT, XLM-RoBERTa:

sentence-transformers/all-MiniLM-L6-v2 (384-dim, fast)
sentence-transformers/all-mpnet-base-v2 (768-dim, higher quality)
BAAI/bge-small-en-v1.5, BAAI/bge-base-en-v1.5

ModernBERT — modernized encoder with rotary embeddings, 8192-token context, GeGLU:

answerdotai/ModernBERT-base (768-dim)
answerdotai/ModernBERT-large (1024-dim)

Or any local directory with config.json + model.safetensors + tokenizer.json. The architecture is detected automatically from model_type in config.json.

Use a local model:

#![allow(unused)]
fn main() {
extern crate jammi_ai;
use jammi_ai::model::ModelSource;
let model = ModelSource::local("/path/to/my-model");
}

Pooling

Pooling — how per-token hidden states collapse into one sentence vector — is model-declared, not hardcoded. On load, the engine reads the model’s 1_Pooling/config.json (the sentence-transformers convention) and pools with the strategy it declares: pooling_mode_cls_token selects CLS pooling (first token — the mode BGE, GTE, and many E5-family models require), and pooling_mode_mean_tokens (or pooling_mode_mean_sqrt_len_tokens, which is exactly equivalent after the mandatory L2 normalization) selects mean pooling. Max and weighted-mean pooling are also supported.

A model whose repository ships no 1_Pooling/ directory — many bare BERT checkpoints — falls back to mean pooling, the historical sentence-transformers default. A model whose 1_Pooling/config.json declares a mode the engine cannot represent (e.g. last-token pooling, or more than one enabled mode at once) fails to load rather than silently pooling incorrectly.

Raw inference (no persistence)

To get embeddings as RecordBatch without writing to disk:

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use jammi_ai::session::InferenceSession;
use jammi_db::store::CachePolicy;
async fn ex(session: &InferenceSession) -> jammi_db::error::Result<()> {
use jammi_ai::model::{ModelSource, ModelTask};

let model = ModelSource::hf("sentence-transformers/all-MiniLM-L6-v2");
let (_results, _outcome) = session.infer("patents", &model, ModelTask::TextEmbedding, &["abstract".into()], "id", CachePolicy::Bypass).await?;
Ok(()) }
}

Python

results = db.infer(
    source="patents",
    model="sentence-transformers/all-MiniLM-L6-v2",
    columns=["abstract"],
    task="text_embedding",
    key="id",
)

Each RecordBatch has prefix columns (_row_id, _source, _model, _status, _error, _latency_ms) plus task-specific columns (e.g., vector for embeddings).

Error handling

Inference never panics on bad input. _status/_error track per-row input validation, applied before the model ever runs:

Condition	`_status`	`_error`	`vector`
Valid text	`"ok"`	null	384-dim float vector
Null text	`"error"`	`"Empty or null text input"`	null
Empty text	`"error"`	`"Empty or null text input"`	null

The batch continues processing even when individual rows fail this validation. A model-forward failure itself — a broken kernel, a contiguity/PTX/dtype mismatch, or a model incapable of the requested task — is always systemic (every row fails identically), never a per-row event, so it fails the whole infer/embedding call with an error rather than being served as an all-"error" relation or an empty “ready” embedding table.

Dynamic batch sizing

The runner starts with the configured inference.batch_size (default: 32). If an out-of-memory error occurs:

Halve the batch size
Retry (up to 3 times)
If OOM persists at batch size 1, the call fails with an error

The reduced batch size is sticky for the remainder of the stream.

Crash recovery

If the process dies mid-generation, the table is left in “building” status. On the next session start, recovery runs automatically:

Parquet missing — mark as failed
Parquet corrupt — delete file, mark as failed
Parquet valid but stuck in “building” — promote to “ready”, rebuild ANN index

No data is lost if the Parquet file was fully written.

DataFusion integration

Result tables are automatically registered in DataFusion and queryable via SQL:

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use jammi_ai::session::InferenceSession;
use jammi_db::catalog::result_repo::ResultTableRecord;
async fn ex(session: &InferenceSession, record: &ResultTableRecord) -> jammi_db::error::Result<()> {
let results = session.sql(&format!(
    "SELECT _row_id, _source_id FROM \"jammi.{}\" LIMIT 10",
    record.table_name
)).await?;
Ok(()) }
}

Generate Image Embeddings

Generate vector embeddings from images using an OpenCLIP-compatible vision model. Results are persisted to Parquet with sidecar ANN indexes, identical to text embeddings — the same search(), evaluation, and SQL tools work on both.

The OpenCLIP family is cross-modal: the vision tower and the text tower in the same checkpoint produce embeddings in a shared latent space, so a text query encoded with the same model can search image embeddings directly. See Search Text Against Images (Cross-Modal) for the full text-to-image recipe.

Basic usage

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use jammi_ai::session::InferenceSession;
use jammi_db::store::CachePolicy;
async fn ex(session: &InferenceSession) -> jammi_db::error::Result<()> {
let (record, _outcome) = session.generate_image_embeddings(
    "figures",
    "laion/CLIP-ViT-B-32-laion2B-s34B-b79K",
    "image",       // column containing image data
    "figure_id",   // key column
    CachePolicy::Bypass,
).await?;

println!("Embedded {} images, {} dimensions", record.row_count, record.dimensions.unwrap());
Ok(()) }
}

Python

db.generate_embeddings(
    source="figures",
    model="laion/CLIP-ViT-B-32-laion2B-s34B-b79K",
    columns=["image"],
    key="figure_id",
    modality="image",
)

Image column format

The image column can be either:

Binary — inline image bytes (PNG, JPEG, TIFF) stored directly in Parquet
Utf8 — file paths pointing to images on disk

Image preprocessing

Each image is automatically preprocessed before embedding:

Pad to square — white canvas, image centered (preserves aspect ratio)
Resize — bicubic interpolation to the model’s input size (224x224 for CLIP)
Normalize — per-channel normalization using constants from the model’s config

Preprocessing parameters (mean, std, image size) are model-driven — parsed from the model’s config file, not hardcoded.

Encode a single image

To embed one image without persistence (e.g., for a query):

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use jammi_ai::session::InferenceSession;
async fn ex(session: &InferenceSession) -> Result<(), Box<dyn std::error::Error>> {
let image_bytes = std::fs::read("query.png")?;
let vector = session
    .encode_image_query("laion/CLIP-ViT-B-32-laion2B-s34B-b79K", &image_bytes)
    .await?;
// vector: Vec<f32>, L2-normalized, dimensionality = model's embed_dim
Ok(()) }
}

Python

with open("query.png", "rb") as f:
    image_bytes = f.read()

vector = db.encode_query(model="laion/CLIP-ViT-B-32-laion2B-s34B-b79K", query=image_bytes, modality="image")

Supported models

OpenCLIP-compatible models with safetensors weights. The repo must carry:

open_clip_config.json with model_cfg.vision_cfg (and model_cfg.text_cfg if you want cross-modal text queries)
open_clip_model.safetensors with OpenCLIP weight key naming (visual.* for vision, root-level for text)
Either a tokenizer.json or the OpenCLIP-native bpe_simple_vocab_16e6.txt.gz (only required for text-side queries)

The architecture (ViT width, layers, heads, patch size, pooling strategy), the shared latent dimensionality (embed_dim), and the preprocessing config (mean, std, image size) are detected automatically from the config — no per-model code path.

Known-working models include patentclip/PatentCLIP_Vit_B (512-dim, the patent-figure use case — uses global average pooling), OpenAI CLIP, and the LAION CLIP-ViT-B-32-* variants. For an end-to-end image-to-image search + retrieval-eval walkthrough, see the runnable image_search recipe.

Output schema

Same as text embeddings:

Column	Type	Description
`_row_id`	Utf8	Key value
`_source_id`	Utf8	Source identifier
`_model_id`	Utf8	Model identifier
`vector`	FixedSizeList(Float32, N)	L2-normalized embedding vector (N = `embed_dim`)

Search

Image embeddings work with the same search() API as text embeddings:

vector = db.encode_query(model="laion/CLIP-ViT-B-32-laion2B-s34B-b79K", query=query_bytes, modality="image")
results = db.search("figures", query=vector, k=10)  # pyarrow.Table

search returns a pyarrow.Table directly; for compound retrieval (join / annotate(...)) use db.sql(...).

Error handling

Condition	`_status`	`_error`
Valid image	`"ok"`	null
Null image	`"error"`	`"Null or missing image input"`
Corrupt image	`"error"`	`"Failed to decode image at row N: ..."`

OpenCLIP-family models carry both a vision tower and a text tower in the same checkpoint, with both towers projecting into a shared latent space. That means a text query embedded with the text tower lives in the same vector space as image embeddings produced by the vision tower — vector search against an image corpus accepts a text query directly, no separate text encoder, no projection bridge.

This recipe shows the full path: index images with the vision tower, embed a text query with the text tower, run search().

1. Index the image corpus with the vision tower

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use jammi_ai::session::InferenceSession;
use jammi_db::store::CachePolicy;
async fn ex(session: &InferenceSession) -> jammi_db::error::Result<()> {
session.generate_image_embeddings(
    "figures",
    "laion/CLIP-ViT-B-32-laion2B-s34B-b79K",
    "image",       // column containing image data
    "figure_id",   // key column
    CachePolicy::Bypass,
).await?;
Ok(()) }
}

Python

db.generate_embeddings(
    source="figures",
    model="laion/CLIP-ViT-B-32-laion2B-s34B-b79K",
    columns=["image"],
    key="figure_id",
    modality="image",
)

2. Embed a text query with the same model’s text tower

encode_query dispatches to the OpenCLIP text tower when the model ID resolves to an OpenCLIP checkpoint. The output vector dimensionality matches embed_dim — the same dim the image embeddings carry.

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use jammi_ai::session::InferenceSession;
async fn ex(session: &InferenceSession) -> jammi_db::error::Result<()> {
let query_vec = session
    .encode_text_query(
        "laion/CLIP-ViT-B-32-laion2B-s34B-b79K",
        "a red circle on a white background",
    )
    .await?;
// query_vec: Vec<f32>, L2-normalized, same length as the image embedding vector
Ok(()) }
}

Python

query_vec = db.encode_query(model="laion/CLIP-ViT-B-32-laion2B-s34B-b79K", query="a red circle on a white background",)

3. Search image embeddings with the text vector

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use std::sync::Arc;
use jammi_ai::session::InferenceSession;
async fn ex(session: Arc<InferenceSession>, query_vec: Vec<f32>) -> jammi_db::error::Result<()> {
let results = session.search("figures", query_vec, 10, None, None).await?.run().await?;
Ok(()) }
}

Python

results = db.search("figures", query=query_vec, k=10)  # pyarrow.Table

search returns a pyarrow.Table directly, carrying your source’s columns (figure_id) alongside the similarity score; pass filter= / select= to refine. For compound retrieval — joining sources or running a model over the results with annotate(...) — use db.sql(...) (see Compound Retrieval and Inference over Flight SQL); it composes identically for cross-modal search.

Why this works

Both towers in an OpenCLIP checkpoint emit vectors of size embed_dim (the shared latent dimensionality declared at the top of open_clip_config.json). The vision tower applies a visual.proj matrix after pooling its patch tokens; the text tower applies a text_projection matrix after pooling at the <|endoftext|> token. The two projections are jointly trained so the cosine similarity between a text vector and an image vector reflects semantic alignment.

If you embed text and images with separate models (e.g. a BERT encoder + a vision model that wasn’t jointly trained with it), the resulting vectors don’t share a latent space and the similarities are meaningless. Cross-modal search only works when both modalities are projected by the same CLIP-style joint training.

Model requirements

Same as Generate Image Embeddings, plus:

open_clip_config.json must contain a populated model_cfg.text_cfg (with width, layers, and either heads or a width that is a multiple of 64).
The safetensors checkpoint must contain the text-tower keys: token_embedding.weight, positional_embedding, transformer.resblocks.*, ln_final.*, and text_projection.
A tokenizer must be available — either an HF-converted tokenizer.json or the OpenCLIP-native bpe_simple_vocab_16e6.txt.gz.

Classify Text

Run a classification model over text columns to assign labels and confidence scores. Any HuggingFace model with id2label in its config works out of the box.

Basic usage

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use jammi_ai::session::InferenceSession;
use jammi_db::store::CachePolicy;
async fn ex(session: &InferenceSession) -> jammi_db::error::Result<()> {
use jammi_ai::model::{ModelSource, ModelTask};

let model = ModelSource::hf("answerdotai/ModernBERT-base-classification");
let (_results, _outcome) = session.infer(
    "patents",
    &model,
    ModelTask::Classification,
    &["abstract".to_string()],
    "id",
    CachePolicy::Bypass,
).await?;
Ok(()) }
}

Python

results = db.infer(
    source="patents",
    model="answerdotai/ModernBERT-base-classification",
    columns=["abstract"],
    task="classification",
    key="id",
)

Output schema

Each RecordBatch has prefix columns plus classification-specific columns:

Column	Type	Description
`_row_id`	Utf8	Key column value
`_source`	Utf8	Source identifier
`_model`	Utf8	Model identifier
`_status`	Utf8	`"ok"` or `"error"`
`_error`	Utf8 (nullable)	Error message if failed
`_latency_ms`	Float32	Inference latency
`label`	Utf8 (nullable)	Predicted class label
`confidence`	Float32 (nullable)	Confidence score (0-1)
`all_scores_json`	Utf8 (nullable)	JSON with all class scores

Supported model architectures

Classification models must have id2label in their config.json. Supported architectures:

BERT family — BERT, RoBERTa, DistilBERT, CamemBERT, XLM-RoBERTa:

Loads classifier.weight + classifier.bias from safetensors
CLS token pooling + linear classifier + softmax

ModernBERT — uses the built-in ModernBertForSequenceClassification:

CLS or MEAN pooling (configured via classifier_pooling in config)
Head (dense + GELU + LayerNorm) + classifier + softmax

Fine-tuning for classification

Train a LoRA adapter with a classification head on your labeled data:

Prepare training data

text,label
"quantum error correction","physics"
"CRISPR gene editing","biology"

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use jammi_ai::session::InferenceSession;
async fn ex(session: &InferenceSession) -> jammi_db::error::Result<()> {
use jammi_ai::fine_tune::FineTuneMethod;
use jammi_db::ModelTask;

let job = session.fine_tune(
    "training",
    "sentence-transformers/all-MiniLM-L6-v2",
    &["text".into(), "label".into()],
    FineTuneMethod::Lora,
    ModelTask::Classification,
    None,
).await?;

job.wait().await?;
Ok(()) }
}

Python

job = db.fine_tune(
    source="training",
    base_model="sentence-transformers/all-MiniLM-L6-v2",
    columns=["text", "label"],
    method="lora",
    task="classification",
)
job.wait()

The fine-tuned model trains a LoRA projection plus a linear classification head using cross-entropy loss. Both are saved to adapter.safetensors.

Error handling

Same per-row error tracking as embeddings:

Condition	`_status`	`label`	`confidence`
Valid text	`"ok"`	Predicted label	0-1 score
Null/empty text	`"error"`	null	null

Extract Entities (NER)

Run a Named Entity Recognition model over text columns to extract person names, organizations, locations, and other entities. Results are returned as JSON arrays of entity spans with character positions and confidence scores.

Basic usage

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use jammi_ai::session::InferenceSession;
use jammi_db::store::CachePolicy;
async fn ex(session: &InferenceSession) -> jammi_db::error::Result<()> {
use jammi_ai::model::{ModelSource, ModelTask};

let model = ModelSource::hf("dslim/bert-base-NER");
let (_results, _outcome) = session.infer(
    "patents",
    &model,
    ModelTask::Ner,
    &["abstract".to_string()],
    "id",
    CachePolicy::Bypass,
).await?;
Ok(()) }
}

Python

results = db.infer(
    source="patents",
    model="dslim/bert-base-NER",
    columns=["abstract"],
    task="ner",
    key="id",
)

Output schema

Column	Type	Description
`_row_id`	Utf8	Key column value
`_source`	Utf8	Source identifier
`_model`	Utf8	Model identifier
`_status`	Utf8	`"ok"` or `"error"`
`_error`	Utf8 (nullable)	Error message if failed
`_latency_ms`	Float32	Inference latency
`entities`	Utf8 (nullable)	JSON array of entity spans

Entity span format

Each entity in the JSON array has:

{
  "text": "Google",
  "label": "ORG",
  "start": 15,
  "end": 21,
  "confidence": 0.97
}

Field	Type	Description
`text`	string	The entity text extracted from the input
`label`	string	Entity type (PER, ORG, LOC, etc.) without B-/I- prefix
`start`	integer	Character start position (inclusive)
`end`	integer	Character end position (exclusive)
`confidence`	float	Average softmax confidence across entity tokens

Supported models

NER models must have id2label with BIO-tagged labels (e.g., B-PER, I-PER, O) in their config.json.

BERT family — loads classifier.weight + classifier.bias on top of the encoder:

dslim/bert-base-NER (English, 4 entity types)
dbmdz/bert-large-cased-finetuned-conll03-english

ModernBERT — same pattern, modern encoder architecture.

How it works

text → tokenize (with character offsets)
     → encoder forward → hidden states [batch, seq_len, hidden]
     → Linear(hidden, num_labels) per token → logits
     → softmax → argmax → BIO tag per token
     → merge consecutive B-/I- tags into entity spans
     → map character offsets back to original text

The BIO decoding handles:

B-TYPE: starts a new entity of that type
I-TYPE: continues the current entity (must match type)
O: outside any entity
Special tokens ([CLS], [SEP], padding) are automatically skipped

Conformal Prediction: Distribution-Free Coverage

Measured companion: for the long-form, executed-and-measured Python treatment, see The Cookbook → Conformal Prediction.

Conformal prediction wraps any existing predictor and turns a point output into a prediction set (classification) or interval (regression) carrying a finite-sample, distribution-free coverage guarantee. Given a held-out calibration set, the marginal coverage of the emitted sets is at least 1 − alpha under exchangeability — for any underlying model, any data distribution, and any sample size. No retraining: a calibration pass and an empirical quantile. Deterministic given the calibration set, which is the audit property.

The serving primitive lives in the open engine because a calibrated set is a serving output — it must work with no license. The operationalization of the guarantee — rolling coverage monitoring, coverage-SLA gating, and managed recalibration under drift — is a governed concern provided by the Jammi platform, built on this same primitive.

The one assumption

The guarantee holds if and only if the calibration and serving data are exchangeable. Under distribution drift it degrades silently. Two levers correct for known structure:

Weighted conformal applies importance weights for a known covariate shift.
Mondrian conformal keeps a per-cohort quantile keyed on a group column, the principled approximation to conditional coverage (full per-input coverage is provably impossible distribution-free).

The primitive applies the weights and the grouping; detecting drift and choosing the cohorts is governance, not a serving output.

The three-way split

Reusing test points to calibrate inflates coverage. The calibration set must be disjoint from both the training set and the test/serving data. The calibration source is a distinct argument throughout the API.

Classification: prediction sets

The classification scores read the per-class softmax mass the classifier already emits.

LAC — nonconformity 1 − p_y; the smallest sets at the nominal level, but non-adaptive.
APS (default) — the cumulative mass of classes ranked most- to least-probable up to the true class; set size adapts to input difficulty.
RAPS — APS plus a tail-rank penalty that shrinks sets on easy inputs.

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
fn ex() -> jammi_db::error::Result<()> {
use jammi_ai::predict::{ClassScore, ConformalModel};

// Held-out calibration: per-class probabilities + the realised class index.
let calibration: Vec<Vec<f64>> = vec![
    vec![0.7, 0.2, 0.1],
    vec![0.1, 0.8, 0.1],
    // ... one row per calibration example
];
let true_labels: Vec<usize> = vec![0, 1 /* ... */];

// Calibrate at 90% nominal coverage with the adaptive APS score.
let model = ConformalModel::classification(&calibration, &true_labels, ClassScore::Aps, 0.1)?;

// Serving: emit the prediction set for a new row of class probabilities.
let probabilities = vec![0.45, 0.4, 0.15];
let prediction_set = model.predict_set(&probabilities, None)?; // e.g. [0, 1]
let _ = prediction_set;
Ok(()) }
}

Python

sets = db.conformalize(
    calibration=[[0.7, 0.2, 0.1], [0.1, 0.8, 0.1]],  # per-class probabilities
    true_labels=[0, 1],                               # realised class indices
    test=[[0.45, 0.4, 0.15]],                         # rows to predict
    alpha=0.1,
    score="aps",                                      # "lac" | "aps" | "raps"
)
# sets -> [[0, 1]]  (one list of admitted class indices per test row)

Regression: prediction intervals

Absolute residual — nonconformity |y − ŷ|; a constant-width interval [ŷ − q̂, ŷ + q̂]. Distribution-free but uninformative under heteroscedasticity.
CQR (Conformalized Quantile Regression) — nonconformity max(q_lo − y, y − q_hi) over a predictor’s lower/upper quantile estimates; an adaptive-width interval whose width tracks local uncertainty.

#![allow(unused)]
fn main() {
extern crate jammi_ai;
extern crate jammi_db;
fn ex() -> jammi_db::error::Result<()> {
use jammi_ai::predict::{ConformalModel, IntervalScore};

// Absolute-residual conformal over held-out (prediction, observation) pairs.
let predictions = vec![1.0, 2.0, 3.0 /* ... */];
let observed = vec![1.2, 1.7, 3.1 /* ... */];
let model = ConformalModel::regression(
    &predictions,
    &[],   // lower quantiles (CQR only)
    &[],   // upper quantiles (CQR only)
    &observed,
    IntervalScore::AbsoluteResidual,
    0.1,
)?;

// Serving: a 90% interval around a new point estimate.
let (lower, upper) = model.predict_interval(2.5, 0.0, 0.0, None)?;
let _ = (lower, upper);
Ok(()) }
}

Python

# Constant-width absolute-residual intervals.
intervals = db.conformalize_interval(
    predictions=[1.0, 2.0, 3.0],   # calibration point estimates
    observed=[1.2, 1.7, 3.1],      # calibration targets
    test_predictions=[2.5],        # point estimates to bound
    alpha=0.1,
)
# intervals -> [(lower, upper)]

# Adaptive-width CQR intervals from quantile estimates.
intervals = db.conformalize_cqr(
    lower=[0.5, 1.5, 2.5],         # calibration lower-quantile estimates
    upper=[1.5, 2.5, 3.5],         # calibration upper-quantile estimates
    observed=[1.2, 1.7, 3.1],
    test_lower=[2.0],
    test_upper=[3.0],
    alpha=0.1,
)

The finite-sample quantile

The conformal threshold is the ⌈(n+1)(1 − alpha)⌉-th smallest calibration score, not the naive ⌈n(1 − alpha)⌉ order statistic. The (n + 1) correction is what makes the guarantee exact rather than merely asymptotic; the naive quantile leaves a ~1/n coverage gap and under-covers. When the calibration set is too small for the requested level — fewer than ⌈1/alpha⌉ − 1 points — the threshold is +∞: the honest, conservative answer is “every label”. A full set is a real signal that the input is hard or the base model is miscalibrated, not a bug.

The `conformal` evidence channel

Conformal outputs ride the evidence substrate exactly as vector and inference do — one channel, four declared columns, no new provenance machinery:

Column	Type	Classification	Regression
`prediction_set`	Utf8	JSON array of class ids	null
`lower`	Float64	null	interval lower bound
`upper`	Float64	null	interval upper bound
`alpha`	Float64	nominal level	nominal level

#![allow(unused)]
fn main() {
extern crate jammi_ai;
extern crate jammi_db;
async fn ex(catalog: &jammi_db::catalog::Catalog) -> jammi_db::error::Result<()> {
use jammi_ai::evidence::conformal::{channel_spec, contribution, ConformalOutput};

catalog.channels().register(&channel_spec()?).await?;

let _contrib = contribution(&[
    ConformalOutput::Set { classes: vec![0, 2], alpha: 0.1 },
    ConformalOutput::Interval { lower: -1.0, upper: 1.0, alpha: 0.1 },
])?;
// `_contrib` merges into result batches via `merge_channels`.
Ok(()) }
}

Verifying coverage

The realised coverage and mean set size of a labelled batch are pure functions in jammi-numerics — the same functions the platform’s coverage monitor calls on a rolling window:

#![allow(unused)]
fn main() {
extern crate jammi_numerics;
fn ex() -> Result<(), jammi_numerics::error::NumericsError> {
use jammi_numerics::calibration::{coverage, mean_set_size};

let hits = [true, true, false, true];     // did each set contain the true label?
let sizes = [2usize, 1, 3, 2];            // cardinality of each set
let realised = coverage(&hits)?;          // ~ 1 - alpha when calibrated
let efficiency = mean_set_size(&sizes)?;  // smaller is sharper
let _ = (realised, efficiency);
Ok(()) }
}

What lives in the platform, not here

This is the serving primitive only. The governed layer — a rolling realised-vs-nominal coverage monitor with drift detection and online adaptation, a coverage-SLA gate, and managed recalibration under shift — is provided by the Jammi platform. It consumes this primitive and the OSS coverage function; it is not part of the open engine.

Distributional Inference: Predict a Distribution, Not a Point

A ModelTask::Regression head returns a predictive distribution per row — a Gaussian (predicted_mean, predicted_std) or a set of quantiles — instead of a single number. Where conformal prediction wraps any predictor with a distribution-free interval, a regression head is trained to emit calibrated uncertainty directly, with proper-scoring objectives that make that uncertainty honest.

Two output forms, both standard:

Parametric Gaussian — the head predicts μ and a raw scale; serving maps the scale to a positive σ = floor + softplus(raw). Smooth, cheap, a closed- form density. The default.
Quantile — the head predicts a set of levels (e.g. 0.05, 0.5, 0.95) directly. Distribution-free in shape, robust to non-Gaussian outcomes, and the input to conformal CQR. The serving adapter sorts each row’s quantiles so they never cross.

Choose the objective by your label

Your label	Task / objective
a continuous outcome + you want a density	`Regression`, β-NLL or CRPS (Gaussian)
a continuous outcome + you want robust intervals	`Regression`, pinball (quantile)
graded similarity scores	embedding fine-tune, cosine-MSE / CoSENT
ordered pairs / rankings	embedding fine-tune, MNRL / triplet

The four regression objectives are all proper scores — minimising them rewards a calibrated distribution, not merely an accurate mean. (MSE on the predicted mean is not proper for a distribution and is only a secondary point- accuracy diagnostic.)

β-NLL (default) — Seitzer’s variance-weighted Gaussian NLL. The plain joint μ,σ² NLL has a well-documented pathology: it down-weights high-error points by inflating their variance, starving the mean’s gradient and collapsing to overconfidence elsewhere. β-NLL re-weights each row’s NLL by a detached σ^{2β}, restoring the mean’s gradient and removing the collapse. β = 0.5 is the recommended default.
CRPS — the closed-form Gaussian continuous ranked probability score, the other collapse-resistant choice: strictly proper, in the outcome’s units, and far more stable under joint μ,σ² training than NLL.
Gaussian NLL — the classic mean-variance objective, provided for completeness and as the pathology baseline. Prefer β-NLL or CRPS.
Pinball — the quantile objective; trains each quantile to its level, with a non-crossing penalty that discourages crossing during training.

The same CRPS / NLL math headlines the calibration eval — one source of truth for the score, used as both the training loss and the eval metric.

Calibrated, not merely accurate

A regression head is not done until its coverage is verified. Two models can share a mean (identical MSE) yet one be badly miscalibrated. The calibration eval is the gate: the central interval should cover at ≈ its nominal level, and the head’s proper score (CRPS/NLL) should beat a constant-variance baseline. Verify coverage; never ship on NLL alone.

Aleatoric, not epistemic

A parametric Gaussian head models aleatoric (irreducible data) noise only. It does not know what it has not seen: off-distribution it can be confidently wrong. For uncertainty about the unseen, reach for the rest of the spectrum:

distribution-free coverage with no model assumption → conformal prediction;
amortized epistemic posteriors → a future neural-process head.

Do not read this head’s σ as epistemic.

The uncertainty evidence channel

A served distribution rides the uncertainty evidence channel (predicted_mean, predicted_std, quantiles, context_ref) — the same additive substrate as vector, inference, and conformal. When the prediction was conditioned on an assembled context set, the channel records which rows informed it in context_ref — data-driven provenance applied to prediction. Register it like any custom channel (see Declare a Custom Provenance Channel); the distribution columns then merge into the result and are SQL-reachable.

Train an In-Context Predictor (Amortized, Adapts Without Retraining)

Measured companion: for the long-form, executed-and-measured Python treatment, see The Cookbook → Context-Conditioned Prediction.

An in-context predictor meta-learns to turn a context set — a target’s retrieved neighbours and their outcomes — into a predictive distribution, in one forward pass with no gradient update at inference. Trained once over many tasks, it adapts to a new target’s neighbourhood the way a prior-fitted network does: condition on the context, emit the posterior, move on. It is the learned-aggregation point of the uncertainty spectrum, above the cheaper distribution-free and parametric options.

Three curated architectures, selected by config (never authored as tensor ops):

CNP — a DeepSets encoder that mean-pools the context, then a decoder MLP. The baseline; the learned twin of fixed pooling.
Attentive CNP (attncnp) — attention pooling over the context, so the target query weights the neighbours that matter. The payoff over fixed pooling, and the member that widens its uncertainty when its context is thin or unfamiliar (epistemic uncertainty).
TNP — a transformer over the (context ∪ target) token set; the strongest member, the prior-fitted-network point.

When to reach for it — the spectrum

The substrate offers three honest-uncertainty tools. Pick the cheapest one that covers your need:

Tool	Mechanism	Training	Reach for it when
Conformal	distribution-free coverage wrap	none (calibrate)	you need a guarantee over any model, audit-reproducible and deterministic
Distributional head	learned aleatoric distribution	fine-tune a head	continuous outcomes where a density or quantiles suffice
In-context predictor	meta-learned posterior over a context set	episodic meta-train	few-shot / adapt-per-target without retraining; you want epistemic uncertainty

The in-context predictor is the heaviest and most expressive. It does not replace the other two — an amortized posterior is sharp but not automatically calibrated, so it is wrapped by conformal for a coverage guarantee (below). Conformal remains the deterministic, audit-reproducible option; this predictor is what a continual, adapt-per-target setting reaches for.

Train

Training is episodic: each task (the distinct values of a task column — a cohort, a time window, a source partition) is split into a context set and held-out targets, and the target’s outcome is scored under a proper objective. Tasks — not points — are partitioned into train/test, so generalisation is measured on held-out tasks. The target is never in its own context (self-exclusion plus a same-task split), and a meta-dataset with too few tasks is rejected rather than meta-trained into memorisation.

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
extern crate jammi_encoders;
use std::sync::Arc;
use jammi_ai::session::InferenceSession;
use jammi_ai::pipeline::context_predictor::{
    ContextArchitecture, ContextPredictorTrainConfig, GaussianObjective, PredictiveHead,
};
async fn ex(session: &Arc<InferenceSession>) -> jammi_db::error::Result<()> {
let spec = ContextPredictorTrainConfig {
    model_id: "patents-context-predictor".into(),
    architecture: ContextArchitecture::AttnCnp, // CNP | AttnCnp | TNP, by config
    key_column: "_row_id".into(),               // the per-row identity
    task_column: "cohort".into(),               // distinct values = the tasks
    value_column: "outcome".into(),             // the scalar y to regress
    context_k: 32,                              // retrieval / context size
    hidden_dim: 64,
    num_heads: 4,
    num_layers: 2,
    head: PredictiveHead::Gaussian {            // an S18 head + proper score
        objective: GaussianObjective::Crps,
    },
    epochs: 100,
    learning_rate: 0.005,
    grad_clip: 1.0,
    test_task_fraction: 0.2,                    // tasks held out for eval
    min_task_count: 4,                          // the meta-overfitting guard
    seed: 0,
};
// Training is a durable, lease-claimed job: `train_context_predictor` submits a
// queued job and returns a handle immediately; a worker claims it, re-samples
// the episodic meta-dataset from the spec, trains it, and registers the model.
let job = session.train_context_predictor("patents", &spec).await?;
job.wait().await?; // block until a worker drives the job to completion
let model_id = job.model_id(); // the spec's `model_id`, now registered
let _ = model_id;
Ok(()) }
}

The objective is one of the proper scores the distributional head uses — no new loss code. A PredictiveHead::Gaussian serves (mean, std); a PredictiveHead::Quantile serves a non-crossing set of quantile levels.

Predict — adapt to a new target, no retraining

Predicting assembles the target’s live context (the serving corpus, with the target excluded) and runs one in-context forward. There is no optimizer and no weight update — the adaptation lives entirely in the forward.

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use std::sync::Arc;
use jammi_ai::session::InferenceSession;
use jammi_ai::pipeline::context_predictor::{ContextServeOptions, PredictedDistribution};
async fn ex(session: &Arc<InferenceSession>) -> jammi_db::error::Result<()> {
// Reload the trained predictor for inference, serving over a corpus.
// `ContextServeOptions::default()` is embedding-similarity (ANN) context with no
// serving split — pass an edge-bearing source to condition on declared edges.
let served = session
    .load_context_predictor(
        "patents-context-predictor",
        "patents",
        ContextServeOptions::default(),
    )
    .await?;

// One forward over the target's live context — no gradient update.
let dist = session
    .predict_with_context_predictor(&served, "US-7654321")
    .await?;
match dist {
    PredictedDistribution::Gaussian { mean, std } => {
        let _ = (mean, std);
    }
    PredictedDistribution::Quantile { levels } => {
        let _ = levels; // ascending (level, value) pairs
    }
}
Ok(()) }
}

The serving source need not be the training source — a predictor meta-trained on one corpus serves a target’s neighbourhood in another of the same shape (the inductive prior-fitted-network property). An optional split predicate scopes the serving context.

Wrap with conformal for a coverage guarantee

An amortized posterior is sharp but can be overconfident off its training tasks — its raw interval under-covers. Calibrate a conformal wrap on a held-out calibration set (tasks disjoint from training) and the served interval recovers its nominal coverage. A Gaussian head wraps with absolute-residual conformal over its mean; a quantile head wraps with CQR over its lower/upper quantiles.

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use std::sync::Arc;
use jammi_ai::session::InferenceSession;
use jammi_ai::pipeline::context_predictor::{ConformalLevers, ServedContextPredictor};
async fn ex(
    session: &Arc<InferenceSession>,
    served: &ServedContextPredictor,
    held_out: &[(String, f64)], // (target_key, observed y), tasks disjoint from training
) -> jammi_db::error::Result<()> {
// Calibrate at 90% nominal coverage on the held-out set. `Marginal` is plain
// split-conformal; governance may instead supply a Mondrian cohort or weights.
let wrap = session
    .calibrate_context_predictor_conformal(served, held_out, 0.1, ConformalLevers::Marginal)
    .await?;

// Serving: turn a prediction into a coverage-guaranteed interval. The optional
// group is the test point's Mondrian cohort (`None` for a marginal wrap).
let dist = session
    .predict_with_context_predictor(served, "US-7654321")
    .await?;
let (lower, upper) = wrap.interval(&dist, None)?;
let _ = (lower, upper);
Ok(()) }
}

Epistemic uncertainty — and its honest caveat

The attentive members (attncnp, tnp) widen their predicted uncertainty when a target’s context is sparse or unfamiliar — the property a fixed distributional head lacks. This is primarily an attention property: a plain CNP’s mean-pool barely widens its σ as the context thins (it conditions on the context size rather than reweighting members), so reach for attncnp or tnp when epistemic widening matters. If you only need aleatoric noise on a continuous outcome, the cheaper distributional head is the right tool.

From Python

# Submits a durable training job and returns its handle; an embedded worker
# runs it. Block on `.wait()`, then read `.model_id` for the registered model.
job = db.train_context_predictor(
    "patents",
    key_column="_row_id",
    task_column="cohort",
    value_column="outcome",
    architecture="attncnp",   # "cnp" | "attncnp" | "tnp"
    output="gaussian",        # or "quantile" with levels=[0.1, 0.5, 0.9]
    objective="crps",         # "crps" | "nll" | "betanll"
    context_k=32,
)
job.wait()
model_id = job.model_id

dist = db.predict_with_context_predictor(
    model_id, source="patents", target_key="US-7654321"
)
# {"kind": "gaussian", "mean": ..., "std": ...}

Semantic Search

Measured companion: for the long-form, executed-and-measured Python treatment, see The Cookbook → Retrieval.

Perform ANN vector similarity search over embedding tables. Results include all original source columns, similarity scores, and evidence provenance.

Basic search

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use jammi_ai::session::InferenceSession;
use jammi_db::config::JammiConfig;
async fn ex(config: JammiConfig) -> jammi_db::error::Result<()> {
use std::sync::Arc;

let session = Arc::new(InferenceSession::new(config).await?);

// Encode a query
let query = session.encode_text_query(
    "sentence-transformers/all-MiniLM-L6-v2",
    "quantum computing applications",
).await?;

// Search — returns top 10 results
let results = session.search("patents", query, 10, None, None).await?
    .run().await?;
Ok(()) }
}

Python

query_vec = db.encode_query(model="sentence-transformers/all-MiniLM-L6-v2", query="quantum computing applications")

results = db.search("patents", query=query_vec, k=10)  # pyarrow.Table
print(results.to_pandas())

What search returns

Results are RecordBatch / pyarrow.Table with:

All original source columns (e.g., id, title, abstract, year)
_row_id — the source key
_source_id — which source the row came from
similarity — cosine similarity score (1.0 = identical, 0.0 = orthogonal)
retrieved_by — List<Utf8> provenance: which channels found this row
annotated_by — List<Utf8> provenance: which channels added evidence post-retrieval

Refining a search

search carries the two knobs the bounded primitive owns directly: a SQL filter predicate over the hydrated results and a select column projection. In Python they are keyword arguments and search returns the table; in Rust they are methods on the fluent QueryBuilder (session.search(...) returns the builder, which also carries sort / limit / join / annotate and a .run()).

Filter and select

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use jammi_ai::session::InferenceSession;
async fn ex(session: &std::sync::Arc<InferenceSession>, query: Vec<f32>) -> jammi_db::error::Result<()> {
session.search("patents", query, 20, None, None).await?
    .filter("year > 2020")?
    .sort("similarity", true)?  // descending
    .limit(5)
    .select(&["_row_id".into(), "title".into(), "similarity".into()])?
    .run().await?;
Ok(()) }
}

Python

results = db.search(
    "patents", query=query_vec, k=20,
    filter="year > 2020",
    select=["_row_id", "title", "similarity"],
)  # pyarrow.Table

Compound query (join, annotate)

Joining other sources and running a model over the results is open composition, so in Python and over the wire it is SQL — db.sql(...), with the annotate(...) table function for inference. In Rust the same operations compose on the fluent builder:

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use jammi_ai::session::InferenceSession;
async fn ex(session: &std::sync::Arc<InferenceSession>, query: Vec<f32>) -> jammi_db::error::Result<()> {
let results = session.search("patents", query, 100, None, None).await?
    .filter("year > 2020")?
    .sort("similarity", true)?
    .limit(10)
    .select(&["title".into(), "similarity".into()])?
    .run().await?;
Ok(()) }
}

Python

results = db.sql("""
    SELECT title, vector
    FROM annotate('local:/models/all-MiniLM-L6-v2', 'text_embedding',
                  'patents.public.patents', 'id', 'abstract') AS a
    JOIN patents.public.patents AS p ON a._row_id = arrow_cast(p.id, 'Utf8')
    WHERE p.year > 2020
    LIMIT 10
""")

See Compound Retrieval and Inference over Flight SQL for the full compound surface — it runs the same SQL in-process or against a remote engine over Flight SQL.

ANN vs exact search

Search automatically selects the best path:

ANN (fast) — when sidecar index files (.usearch + .rowmap + .manifest.json) exist and load successfully
Exact (brute-force) — fallback when sidecar files are missing or corrupt

The caller never knows the difference. Deleting sidecar files degrades performance but not correctness.

Embedding table resolution

When multiple embedding tables exist for a source, search uses the most recently created “ready” table. The resolution order:

Explicit table name (if provided)
Latest ready embedding table for the source (by created_at)
Error if no embedding table exists

Search over gRPC (edge runtimes)

EmbeddingService exposes Search on the typed gRPC surface, so a process that reaches the engine over gRPC-web — an edge function that cannot speak Flight SQL’s bidirectional HTTP/2 — can run the same similarity search it already uses for AddSource, GenerateAudioEmbeddings, and EncodeAudioQuery. It is the same engine capability on an additional transport, not a second search path.

A SearchRequest carries the source, a k, an optional SQL filter (predicate pushdown), and an optional select column list. The query is a oneof:

query_vector — a precomputed vector. The usual flow is encode-then-search: call EncodeAudioQuery (or any client-side encoder) to get the vector, then feed it back as the query.
row_key — query-by-example. The engine resolves that row’s stored vector internally and ranks by it (“rows like this row”). The vector never crosses the wire.

// encode-then-search
embedding = EncodeAudioQuery{ model_id, audio_bytes }.embedding
hits      = Search{ source_id, query_vector: { values: embedding }, k: 10 }.hits

// query-by-example (no re-encode round-trip; vector stays in the engine)
hits      = Search{ source_id, row_key: "clip_1", k: 10 }.hits

Each SearchHit carries the key (the matched row’s key-column value), the score (similarity), and a columns map. columns is empty unless select is non-empty, in which case it holds the requested columns stringified — the engine always projects the key and score alongside them so a hit is fully formed. Heavy clients that want Arrow batches keep using Flight SQL; Search returns lightweight structured rows so an edge bundle needs no Arrow reader.

Building a similarity graph

build_neighbor_graph materializes the k-nearest-neighbour graph of an existing embedding table: for every row it finds the k most similar rows within the same table and writes one directed edge per pair as a queryable edge relation.

When to reach for it — and when not to

Use search for the neighbours of specific rows. Use build_neighbor_graph only when you need the whole edge set at once.

If you want “rows like this row”, call search (or search_by_id). It loads the index once per query and hydrates results on demand — that is the per-query path, and build_neighbor_graph is not for it.

build_neighbor_graph exists for global-structure work, where you consume all edges as a durable artifact:

near-duplicate detection / semantic dedup,
clustering and connected components,
entity resolution,
generating training pairs for graph-aware fine-tuning.

For those, looping search over every row would reopen the index per row, pay an n× hydration round-trip, and leave you with n detached result sets instead of one catalogued, tenant-scoped, queryable table. The edge table this verb writes closes that gap — and only for the global case.

The edge relation

The result is an ordinary result_table you can query, join, and federate like any other. One row per directed edge:

Column	Type	Meaning
`src`	Utf8	source node — the source key
`dst`	Utf8	neighbour node — the source key
`rank`	Int32	`1` = nearest, … `k`
`similarity`	Float32	`1.0 - cosine_distance`

src and dst are the embedding table’s keys, so the edge table joins directly to your source data — no detour through the embedding table.

Approximate by default; exact on demand

The default driver is index-assisted (HNSW). It is fast (n · log n) but its output is:

approximate — HNSW recall is below 100%, so some true neighbours are missed, and
non-deterministic — two builds can differ in the long tail of weak edges.

For dedup and clustering this is exactly what you want. When you need reproducible, auditable edges, pass exact = true: a brute-force, deterministic, complete n² pass (gated by a row-count ceiling, so it refuses to run on very large tables).

Example: near-duplicate detection

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use std::sync::Arc;
use jammi_ai::session::InferenceSession;
use jammi_db::config::JammiConfig;
use jammi_ai::pipeline::neighbor_graph::BuildNeighborGraph;
use jammi_db::store::CachePolicy;
async fn ex(config: JammiConfig, model_id: &str) -> jammi_db::error::Result<()> {
let session = Arc::new(InferenceSession::new(config).await?);

// Embed the corpus first (any embedding model).
session
    .generate_text_embeddings("patents", model_id, &["abstract".into()], "id", CachePolicy::Bypass)
    .await?;

// Materialize the kNN graph, keeping only strong, reciprocal edges.
let (edges, _outcome) = session
    .build_neighbor_graph(
        "patents",
        None, // resolve the latest embedding table
        &BuildNeighborGraph {
            k: 5,
            min_similarity: Some(0.9), // near-duplicate threshold
            mutual: true,              // both rows agree they are neighbours
            ..Default::default()
        },
        CachePolicy::Bypass,
    )
    .await?;

// The edge table is now an ordinary relation. Group by src to list each
// row's near-duplicates, joined directly to the source on the key.
let dupes = session
    .sql(&format!(
        "SELECT e.src, e.dst, e.similarity, p.title \
         FROM \"jammi.{}\" e \
         JOIN patents.public.patents p ON p.id = e.src \
         ORDER BY e.similarity DESC",
        edges.table_name
    ))
    .await?;
Ok(()) }
}

Example: graph traversal stays in SQL

build_neighbor_graph transports adjacency and weight — it never walks the graph. Two-hop expansion, paths, and reachability are plain SQL over the edge relation, on every transport:

-- Two-hop neighbours via a self-join on the edge table.
SELECT a.src AS origin, b.dst AS two_hops_away
FROM "jammi.<edge_table>" a
JOIN "jammi.<edge_table>" b ON b.src = a.dst
WHERE a.src = '<some_key>';

For deeper walks, use a WITH RECURSIVE CTE. There is no traversal operator and no graph DSL — the edge table is just a relation.

Example: training-data prep for graph-aware fine-tuning

The edge table is the raw material for neighbour-contrastive embedding fine-tuning. Turn edges into (anchor, positive, negative) triplets — a neighbour is a positive, a non-neighbour a negative — and feed them to the existing triplet/contrastive fine-tune path. The walk policy, negative sampling, and objective are yours; Jammi supplies the edges and the loss.

Propagate Embeddings over a Graph (Decoupled GNN)

Measured companion: for the long-form, executed-and-measured Python treatment, see The Cookbook → Graph Signal Processing.

propagate_embeddings is the forward pass of a graph convolution, run as a data-plane operation. For every row of an embedding table it replaces the row’s vector with an aggregate of its k-hop neighbourhood — ÂᵏX — and writes the result as a new, ordinary embedding table (searchable, joinable, re-graphable).

This is the propagate half of a decoupled GNN. SGC showed the nonlinearities between graph-conv layers are removable: precompute the propagated features, then learn a simple head on top. APPNP added the teleport restart that keeps deep propagation from collapsing. Neither needs autograd, an architecture, or message-passing code — ÂᵏX is a graph join plus a grouped vector average, and that is all this verb is.

It composes with anything that consumes an embedding table: search the propagated vectors, evaluate them, build a neighbour graph over them, or fine-tune a head on them (the SGC/APPNP order — propagate first, then fine-tune).

When propagation helps — measure homophily first

Smoothing helps only when neighbours share signal.

Averaging a node with its neighbours denoises it when the graph is homophilous — neighbours tend to be the same kind of thing (papers cite papers on the same topic; co-purchased items share a category; KG entities of one type link to one type). Then the propagated vectors cluster tighter and downstream search / classification improves.

On a heterophilous graph — neighbours tend to differ — propagation mixes in opposing signal and is beaten by ignoring the graph entirely. This is not a silent failure mode to discover in production: measure it first. The per-edge-type homophily diagnostic reports, for each edge type, how often its endpoints share a label. Propagate over the homophilous types; for genuinely heterophilous structure the answer is learned attention (a later spec), not fixed averaging.

The default is over-smoothing-safe

Iterated averaging is exactly the operation that collapses every node into one indistinguishable point as the hop count grows (rank collapse). Three defaults keep that in check:

PageRank-decay weighting (DegreeNormalized + an α-teleport restart). Each hop re-mixes a share α of every node’s original embedding back in, so a node stays anchored to itself however deep you go (the APPNP fix). α defaults to 0.1.
Two hops by default, capped at three. Beyond that, more hops add collapse, not signal.
Self-loops (Ã = A + I). Every node aggregates over itself, so an isolated node propagates to its own embedding rather than vanishing, and the symmetric normalisation has no oscillating eigenmode.

Weightings

Weighting	Aggregation	Use
`DegreeNormalized` (default)	symmetric `Â = D̃^{-1/2}(A+I)D̃^{-1/2}`, with the `α`-teleport	the safe default (SGC/APPNP)
`Uniform`	random-walk mean `D̃^{-1}Ã` (each node = mean of itself + neighbours)	unweighted graphs, simplest smoothing
`EdgeSimilarity`	edge-weighted mean `Σ(w·x)/Σw`	use the edge weight as fixed attention (e.g. an S9 similarity edge); negative weights clamp to zero

Output: final block, or Jumping Knowledge

By default the output is the final propagated block X⁽ᴷ⁾, a d-dimensional embedding table in the input’s vector space.

PropagationOutput::JumpingKnowledge instead concatenates the per-hop blocks [X⁽⁰⁾ ‖ … ‖ X⁽ᴷ⁾], each L2-normalised before concat so the raw block does not dominate cosine search. This lets a downstream head pick the right receptive depth per node, but the output is (K+1)·d-dimensional and indexes in its own space — do not search it against the original d-dimensional vectors.

Example: propagate over a citation graph

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use std::sync::Arc;
use jammi_ai::session::InferenceSession;
use jammi_db::config::JammiConfig;
use jammi_ai::pipeline::graph_neighbourhood::{EdgeDirection, EdgeSourceRef};
use jammi_ai::pipeline::graph_propagation::{PropagateRequest, PropagationWeighting};
use jammi_db::store::CachePolicy;
async fn ex(config: JammiConfig, model_id: &str) -> jammi_db::error::Result<()> {
let session = Arc::new(InferenceSession::new(config).await?);

// Embed the documents first (any embedding model).
session
    .generate_text_embeddings("papers", model_id, &["abstract".into()], "id", CachePolicy::Bypass)
    .await?;

// Propagate over a declared citation edge source (src/dst are the paper ids,
// which are the embedding keys). Citations are undirected for smoothing.
let (propagated, _outcome) = session
    .propagate_embeddings(
        &PropagateRequest::new(
            "papers",
            EdgeSourceRef::Registered {
                source_id: "citations".into(),
                src_column: "citing".into(),
                dst_column: "cited".into(),
                type_column: None,
                weight_column: None,
                as_of_column: None,
            },
        )
        .with_direction(EdgeDirection::Undirected)
        .with_weighting(PropagationWeighting::DegreeNormalized)
        .with_hops(2),
        CachePolicy::Bypass,
    )
    .await?;

// The result is an ordinary embedding table: search it, evaluate it, or graph
// it like any other.
let neighbours = session
    .sql(&format!(
        "SELECT _row_id FROM \"jammi.{}\" LIMIT 5",
        propagated.table_name
    ))
    .await?;
let _ = neighbours;
Ok(())
}
}

Propagating over an S9 similarity graph

You can also propagate over the similarity graph Jammi itself builds — pass its table name as the edge source:

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use std::sync::Arc;
use jammi_ai::session::InferenceSession;
use jammi_db::config::JammiConfig;
use jammi_ai::pipeline::graph_neighbourhood::EdgeSourceRef;
use jammi_ai::pipeline::graph_propagation::PropagateRequest;
use jammi_db::store::CachePolicy;
async fn ex(config: JammiConfig, graph_table: &str) -> jammi_db::error::Result<()> {
let session = Arc::new(InferenceSession::new(config).await?);
let (_propagated, _outcome) = session
    .propagate_embeddings(
        &PropagateRequest::new(
            "items",
            EdgeSourceRef::NeighborGraph {
                table_name: graph_table.into(),
            },
        ),
        CachePolicy::Bypass,
    )
    .await?;
Ok(())
}
}

But note the caveat from graph-supervised fine-tuning: a similarity graph is k-NN under the base metric, so propagating over it mostly re-averages things the model already thinks are close. Declared edges (a citation network, a co-purchase log, a knowledge graph’s typed relations) carry structure the base metric does not already encode — that is where propagation adds signal.

Determinism

Propagation is deterministic: every fold, teleport, and weighted sum runs in f64 over a fixed (node, neighbour) order, so the output is byte-identical regardless of how many threads the engine runs. It is the reproducible point on the structure-aware spectrum — fixed averaging, no learned parameters.

Bounds

The edge set is loaded under a row ceiling (PropagateRequest::max_rows); a graph larger than that is refused loudly rather than risking an out-of-memory pass. Whole-graph propagation beyond memory (chunking by the join) is future work.

Hybrid Retrieval: Lexical (BM25) + Reciprocal-Rank Fusion

Measured companion: for the long-form, executed-and-measured Python treatment, see The Cookbook → Retrieval.

Dense vector search finds rows that mean the same thing as your query; lexical (BM25) search finds rows that contain the same words. Each misses what the other catches — dense search fumbles rare identifiers and exact phrases, lexical search misses paraphrase. Hybrid retrieval runs both and fuses their rankings, and it is the standard production recipe because it reliably beats either alone.

Jammi ships two pieces for this:

A lexical sidecar (LexicalIndex) — a tantivy BM25 inverted index that rides beside a result table’s Parquet object, the lexical peer of the USearch ANN sidecar.
Reciprocal-rank fusion (rrf_fuse) — merges any number of ranked lists by rank, not score.

Fusing by rank is the whole point: BM25 scores and cosine similarities live on incompatible scales, so averaging them is meaningless. RRF never looks at a raw score — it sums 1 / (k_rrf + rank) across the lists a row appears in, so the fused order depends only on where a row landed in each list. The default k_rrf is 60 (Cormack et al., SIGIR 2009; robust across 40–80).

Build a lexical index

A LexicalIndex is built over (row_id, text) pairs — the text is whatever text columns of the row you want searchable, joined by the caller. The analyzer is configurable; English (lowercase + Porter stemming) is the default, and Raw (lowercase, no stemming) is the escape hatch for text the English stemmer would mangle (codes, identifiers, non-English).

#![allow(unused)]
fn main() {
extern crate jammi_ai;
extern crate jammi_db;
fn ex() -> jammi_db::error::Result<()> {
use jammi_ai::index::{Analyzer, LexicalIndex};

let rows = vec![
    ("doc-1", "a method for reducing turbine blade vibration"),
    ("doc-2", "an apparatus for cooling turbine engine blades"),
    ("doc-3", "a recipe for baking sourdough bread"),
];

let lexical = LexicalIndex::build(rows, Analyzer::English)?;
let hits = lexical.search("turbine engine", 10)?;
for hit in &hits {
    println!("{} bm25={:.3} rank={}", hit.row_id, hit.bm25_score, hit.rank);
}
Ok(()) }
}

Each LexicalHit carries the row_id, its raw bm25_score, and its 0-based rank — the rank is what fusion consumes.

Lifecycle and scope

The lexical sidecar’s lifecycle equals the ANN sidecar’s: it is built (and rebuilt) with the table. An immutable result table that is rebuilt produces a fresh sidecar; for a mutable-table source, re-ingesting the changed rows into a new index is the caller’s mode. Search applies no row-level filter — isolation is table-level, exactly as the ANN search path: resolve the table through the tenant-scoped catalog and hand the index only that table’s rows.

Fuse dense and lexical rankings

rrf_fuse takes a slice of ranked lists — each a best-first list of _row_ids — and returns one fused ranking. The dense list is the ANN search result; the lexical list is the LexicalIndex result. A third list (e.g. a graph-retrieval channel) fuses identically, with no special-casing.

#![allow(unused)]
fn main() {
extern crate jammi_ai;
use jammi_ai::query::{rrf_fuse, DEFAULT_K_RRF};

// Best-first row-id lists from each retriever.
let dense = vec!["doc-2", "doc-1", "doc-5"];   // ANN cosine order
let lexical = vec!["doc-1", "doc-2", "doc-9"]; // BM25 order

let fused = rrf_fuse(&[dense, lexical], DEFAULT_K_RRF);
for hit in &fused {
    println!("{} rrf={:.4}", hit.row_id, hit.rrf_score);
}
}

Rows that both retrievers surface rise to the top — cross-list agreement is exactly what RRF rewards. The output is fully deterministic: it is sorted by fused score descending, ties broken ascending by row_id, and it does not depend on the order you pass the lists in. A row repeated within a single list counts only once, at its best rank.

k_rrf is exposed, not forced. Larger values flatten the gap between adjacent ranks (a deep-but-agreed-upon row matters more); smaller values sharpen the reward for top-of-list placement. DEFAULT_K_RRF (60) is the recommended start.

Record the evidence

BM25 contributions ride the built-in bm25 evidence channel, the lexical peer of vector’s similarity. It declares two columns — bm25_score (Float32) and bm25_rank (Int64) — and a contribution is supplied to merge_channels exactly as the vector channel’s is, so a fused result carries both its dense and its lexical provenance side by side. See Declare a Custom Provenance Channel for the contribution mechanics; bm25 needs no registration — it is seeded with the catalog.

Assemble a Context Set for Conditioned Prediction

Measured companion: for the long-form, executed-and-measured Python treatment, see The Cookbook → Retrieval.

A prediction is often best made conditioned on a neighbourhood: not “what is the label of this row” in the abstract, but “given the k most similar labelled rows, what is the label of this row.” That neighbourhood is a context set — C = {(xᵢ, yᵢ)} — and in a database it is not an abstraction to invent. It is a search joined to its labels.

assemble_context makes that first-class. It retrieves a target’s k nearest neighbours, pairs them with their outcome columns, and pools the neighbour vectors — permutation-invariantly — into one fixed-width context vector a predictor can condition on. It is the encode-and-aggregate half of a Neural Process; the decode half (a learned predictor over the representation) composes on top.

When this is the right tool

Reach for assemble_context when you want a reusable set representation of a target’s retrieved neighbourhood — a conditioning vector, a prototype/centroid, a bag-of-evidence summary. If you only need to aggregate one specific, already known set of rows once, that is a SQL GROUP BY you already have; this is the operator that turns any target’s retrieval into a representation, reproducibly, with the leakage guards a prediction context needs.

Assemble and encode

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use jammi_ai::session::InferenceSession;
use jammi_ai::pipeline::context_set::{ContextRequest, SetAggregator};
async fn ex(session: &std::sync::Arc<InferenceSession>, query: Vec<f32>) -> jammi_db::error::Result<()> {
let mut request = ContextRequest::new("patents", query, 10);
request.value_columns = vec!["category".into()];     // the labels carried per context row
request.aggregator = SetAggregator::Mean;             // mean | sum | max pooling

let context = session.assemble_context(&request).await?;

if let Some(vector) = &context.context_vector {
    // condition a predictor on `vector` + `context.context_size`
    let _ = (vector, context.context_size);
}
Ok(()) }
}

The result carries:

context_vector — the pooled ρ(Σ φ(xᵢ)) representation, or None for an empty context (no neighbour survived the guards below — treat as low-confidence / fall back to the prior, never as a one-element average).
context_size — the number of neighbours that entered the pool, carried separately so a decoder can use the count signal without it corrupting the pooled vector.
context_keys — the context members’ keys, in retrieval (descending similarity) order.
value_rows — the requested value_columns of each member, in the same order.

The leakage guards (on by default)

A target that retrieves itself as its own context trivially leaks the answer when a value column is the prediction target. So:

exclude_self defaults on. Set exclude_key to the target’s own row key (when the query vector belongs to a stored row) and that same-key neighbour is dropped before pooling; the retrieval over-fetches by one so a self-hit never shrinks the context below k.
split scopes the context to a train split. When the context feeds a training or evaluation target, pass e.g. split = Some("split = 'train'".into()) so the target’s own outcome stays held out — the same train/target line the evaluation harness enforces.

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
use jammi_ai::pipeline::context_set::ContextRequest;
fn ex(query: Vec<f32>) {
let mut request = ContextRequest::new("patents", query, 10);
request.exclude_key = Some("US-1234567".into());   // drop the target's own row
request.split = Some("split = 'train'".into());     // context from the train split only
let _ = request;
}
}

Pooling: fixed, permutation-invariant, deterministic

The encoder pools through the engine’s vector-aggregation functions (vector_mean / vector_sum / vector_max) — the same element-wise aggregation the engine ships for grouped vector reduction. The pool is:

permutation-invariant — shuffling the context rows yields a byte-identical vector (the aggregate folds with a commutative, associative operator);
deterministic under exact retrieval — the pooled vector is reproducible across runs.

mean discards set size; sum encodes it; max is robust but lossy. None is universally right, which is why the aggregator is a knob and context_size is always carried alongside.

This is fixed pooling — the DeepSets / Conditional-Neural-Process expressiveness ceiling. It cannot model which context element matters; learned attention pooling (the AttnCNP point on the spectrum) is a separate, downstream capability, not a silent extension of this one.

Materialise for batch workflows

For batch pipelines, pool every target once and land the results as a normal embedding-shaped result table — searchable and joinable like any other embedding table, with its own sidecar ANN index:

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use jammi_ai::session::InferenceSession;
use jammi_ai::pipeline::context_set::{ContextRequest, MaterializedContext};
use jammi_db::store::CachePolicy;
async fn ex(session: &std::sync::Arc<InferenceSession>, rows: Vec<(String, Vec<f32>)>) -> jammi_db::error::Result<()> {
// The recipe carries *how* every row was pooled — the source, candidate set,
// value columns, aggregator, self-exclusion, and split — and names the source.
let recipe = ContextRequest::new("patents", vec![0.0_f32; 32], 5);
let (table, _outcome) = session
    .materialize_context(
        MaterializedContext {
            rows: &rows,
            dimensions: 32,
            recipe: &recipe,
            // These targets are `patents` rows, so their keys are `patents.id`.
            key_column: Some("id"),
        },
        CachePolicy::Bypass,
    )
    .await?;
// `table` is a normal embedding result table: search it, join it, index it.
println!("materialised context table: {}", table.table_name);
Ok(()) }
}

Each target’s key becomes the table’s _row_id; the pooled context vector becomes its vector. A materialised context set is a first-class member of the same table family every embedding table belongs to. The recipe materialize_context takes is the batch’s shared assembly definition (the source comes from it); the per-target query vectors are the inputs the recipe ran over, which is why they are the rows, not part of the recipe.

key_column names which column of the source those target keys came from, and is recorded as the table’s provenance: it is what lets a reader join patents.id = <context table>._row_id to get back to the rows the targets are. You declare it because the targets are yours — materialize_context receives only (key, vector) pairs and cannot see where the keys came from — but it does check the half it can: a column the source does not have is rejected rather than recorded. Pass None when the targets are free vectors that correspond to no stored row; the table is still searchable, but nothing joins it back to a source.

Condition a Prediction on Declared-Edge Context (Bring Your Own Graph)

assemble_context builds a target’s context set from its embedding-similar neighbours — search(target, k). That is the right neighbourhood when similarity is the relationship you want to condition on. But often the relationship that matters is one only your domain declares: the papers a paper cites, the products co-purchased with this one, the concepts an entity is an is-a of. Those edges carry structure an embedding metric does not reconstruct — and they are exactly the context a prediction is most defensible conditioned on.

A context set is a search. It is also a walk — and the walk you care about is the one only you can declare.

S16-G makes that first-class: a second context source for the same assemble_context. You register an edge relation, and a target’s context becomes its bounded, target-anchored declared-edge neighbourhood — pooled through the same permutation-invariant set encoder, under the same leakage and tenancy contracts, decoded by the same calibrated predictor. The engine transports adjacency; it never learns what an edge means.

When this is the right tool

Reach for declared-edge context when a target’s most informative neighbours are the ones your graph names, not the ones the base metric happens to place nearby: a citation classifier, a co-purchase recommender, a knowledge-graph-backed labeller, a transaction-graph scorer. Use ANN context when similarity is the signal; use Hybrid (below) when declared edges are the signal but the graph is sparse and you want similarity to densify it.

It is not a general graph-traversal verb. The gather is target-anchored and depth-/fan-out-bounded — never a free walk — which is what keeps it inside the tenant-scope guarantee.

The edge source

Any relation with two key columns is an edge source. Register it like any other source, then point the gather at it:

a similarity graph you already built (neighbor_graph), or
an external edge table you register (two key columns, optionally a type and a weight column).

The edge endpoints are row keys: a neighbour joins to its stored vector and its outcome columns by key, exactly as an ANN neighbour does.

Assemble declared-edge context

Rust

#![allow(unused)]
fn main() {
extern crate jammi_ai;
extern crate jammi_db;
use std::sync::Arc;
use jammi_ai::session::InferenceSession;
use jammi_ai::pipeline::context_set::{ContextRequest, ContextSource};
use jammi_ai::pipeline::graph_neighbourhood::{EdgeGather, EdgeSourceRef};

async fn demo(session: Arc<InferenceSession>, target_vector: Vec<f32>) -> jammi_db::error::Result<()> {
// A registered edge relation: a node's declared neighbours.
let gather = EdgeGather::new(EdgeSourceRef::Registered {
    source_id: "citations".into(),
    src_column: "from_id".into(),
    dst_column: "to_id".into(),
    type_column: None,
    weight_column: None,
    as_of_column: None,
});

let mut request = ContextRequest::new("papers", target_vector, 0);
request.source = ContextSource::Edges(gather);
// The target's own row key is the gather anchor (and is excluded from its
// own context — the leakage guard).
request.exclude_key = Some("paper-42".into());
request.value_columns = vec!["topic".into()];

let context = session.assemble_context(&request).await?;
println!("conditioned on {} declared-edge neighbours", context.context_size);
Ok(())
}
}

Python

out = db.predict_with_context_predictor(
    model_id,
    source="papers",
    target_key="paper-42",
    edge_source="citations",     # a registered edge relation
    edge_src_column="from_id",
    edge_dst_column="to_id",
    edge_hops=1,                 # bounded depth (default 1)
    edge_fanout=25,              # sample at most 25 neighbours per node per hop
)
# The prediction carries how its context was assembled and which rows it used:
assert out["source"] == "edges"
print(out["context_ref"])        # the declared-edge member keys

The bounds — and why they are caps, not knobs to crank

hops (default 1, hard-capped). Beyond ~2–3 hops, neighbour pooling is Laplacian over-smoothing — the pooled vector washes out and loses signal. Depth is a precision/recall trade-off, not “more context is better.”
fanout (sample, don’t enumerate). A high-degree node’s neighbourhood is intractable to enumerate; fanout bounds the neighbours sampled per node per hop. The sample is seeded-deterministically from the target, so a gather reproduces byte-identically. None is exact (enumerate all) and uses no randomness; a truncated neighbourhood is reported, never silently dropped.
edge_types / min_weight. Filter which edges the walk follows. Types are a filter, never learned aggregation — a consumer wanting learned multi-relational message passing runs it in a graph library and registers the resulting node embeddings back as a source.

Hybrid: declared edges ∪ similarity

When the graph is sparsely connected, union the declared-edge neighbours with the ANN neighbours and pool once — declared edges as the signal, similarity as the densifier:

#![allow(unused)]
fn main() {
extern crate jammi_ai;
use jammi_ai::pipeline::context_set::{ContextSource, HybridMerge};
use jammi_ai::pipeline::graph_neighbourhood::{EdgeGather, EdgeSourceRef};
fn demo(gather: EdgeGather) -> ContextSource {
ContextSource::Hybrid {
    ann_k: 10,
    edges: gather,
    merge: HybridMerge::Union,
}
}
}

Check homophily before you trust it

A declared edge can be heterophilous — cites may connect dissimilar items — and pooling over a heterophilous edge type degrades a prediction rather than helping it. Declared-edge context is an option on the spectrum, never unconditionally better than similarity. Before you rely on a type, read its homophily:

#![allow(unused)]
fn main() {
extern crate jammi_ai;
extern crate jammi_db;
use std::sync::Arc;
use jammi_ai::session::InferenceSession;
use jammi_ai::pipeline::graph_neighbourhood::EdgeGather;
async fn demo(session: Arc<InferenceSession>, gather: &EdgeGather) -> jammi_db::error::Result<()> {
// Per-edge-type label-agreement over a labelled set: a type near the label's
// chance rate is heterophilous — pooling over it is unlikely to help.
let homophily = session
    .homophily_by_edge_type(gather, "papers", "id", "topic")
    .await?;
for (edge_type, agreement) in &homophily {
    println!("{edge_type}: {agreement:.2} label agreement");
}
Ok(())
}
}

The decoder also receives the target’s own (ego) features alongside the pooled neighbour vector, so it can down-weight an unhelpful neighbourhood rather than being forced to trust it.

Coverage over graph context

A graph-conditioned prediction is decoded and conformally wrapped exactly like an ANN-conditioned one, and it always serves. But graph correlation can break the exchangeability that marginal split-conformal coverage assumes — so the served prediction carries the assembly source fact and its member keys, and the coverage claim is attributed, never silently presented as a guarantee. Choosing whether to apply a group-conditional (Mondrian) or importance-weighted lever, and which cohort or weights to use, is a governance decision the serving layer applies but never makes. The engine surfaces the fact; governance chooses the lever.

Tenancy

An edge source is tenant-scoped like every other source. The gather runs inside the session’s tenant scope, so an edge whose endpoint belongs to another tenant is filtered before it is ever materialised — a declared edge cannot leak one tenant’s rows into another’s context.

Point-in-time joins: matching facts to the instant they were known

Measured companion: for the long-form, executed-and-measured Python treatment, see The Cookbook → Point-in-Time Correctness.

An as-of join matches each row of a spine relation to the at-most-one row of a facts relation that was valid as of the spine row’s instant, within the same group. It is the relational primitive for point-in-time correctness.

The problem it solves is leakage. If you assemble a table by joining each spine row to any fact in its group, you import facts stamped after the spine instant — information that was not yet known. A forward join imports the future. The as-of join takes only the fact valid at or before each instant, so every attached value reflects what was knowable then, and nothing later.

The engine exposes this as one verb, asof_join, over two registered relations. It carries only what every time-aware caller needs — an equality grouping, a temporal ordering key, a match direction, boundary inclusivity, an optional look-back tolerance, and a deterministic tie-break — and writes a result table that carries the same materialization manifest every other producer does.

The call

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use std::sync::Arc;
use jammi_ai::session::InferenceSession;
use jammi_db::config::JammiConfig;
use jammi_ai::pipeline::asof::{
    AsofJoinSpecBuilder, AsofKey, Boundary, MatchDirection, TieBreak, Tolerance,
};
async fn ex(config: JammiConfig) -> jammi_db::error::Result<()> {
let session = Arc::new(InferenceSession::new(config).await?);

// `events` is the spine (its `t` column is the as-of instant); `facts` carries
// the values that were valid over time (its `vt` column is their validity time).
// `key` groups the match so a fact only matches an event in the same group.
let spec = AsofJoinSpecBuilder::new(
        AsofKey { by: vec!["key".into()], time: "t".into() },  // spine
        AsofKey { by: vec!["key".into()], time: "vt".into() }, // facts
    )
    .direction(MatchDirection::Backward)  // most recent at or before
    .boundary(Boundary::Inclusive)        // a fact stamped exactly at t matches
    .tolerance(Some(Tolerance::Duration(5_000_000))) // ignore facts >5s stale
    .tie_break(TieBreak::ByColumnDesc("seq".into())) // newest seq wins a tie
    .project(vec!["value".into()])        // attach this fact column
    .build();

let table = session.asof_join("events", "facts", &spec).await?;

// The result is an ordinary relation: every spine row, with the matched fact's
// `value` attached (null where nothing matched within the rules). Read it via SQL.
let _rows = session
    .sql(&format!(
        "SELECT t, value FROM \"{name}\".public.\"{name}\" ORDER BY t",
        name = table.table_name,
    ))
    .await?;
Ok(())
}
}

The spine is always fully preserved — an unmatched spine row keeps its columns and carries nulls for the fact columns. Dropping unmatched rows would silently shrink the result; a caller who wants inner semantics filters on a non-null fact column themselves.

The four pinned knobs

Each knob is the choice every engine gets subtly different. The engine pins them once, on the spec, never inferred — and each one changes the result.

Direction

MatchDirection::Backward (the default) takes the most recent fact at or before the instant — the only leakage-safe choice when the spine instant is a point you must not see past. Forward takes the first fact at or after (e.g. “the next scheduled event after each reading”). Nearest takes the smallest absolute distance, resolving equidistant candidates toward the past, and requires a numeric temporal key.

Boundary

Boundary::Inclusive (the default, <=) lets a fact stamped exactly at the instant match; Boundary::Exclusive (<) excludes it. Over identical inputs, the two differ exactly on the rows that have a fact coincident with the instant.

Tolerance

None (the default) looks back arbitrarily far. Some(Tolerance::Duration(µs)) for a temporal key, or Some(Tolerance::Steps(n)) for an integer key, discards a candidate farther than the limit — the spine row goes unmatched rather than matching a stale fact. The limit is measured relative to each spine instant.

Tie-break

When two facts share the matched instant within a group, the match is ambiguous. TieBreak::ByColumnDesc("seq") disambiguates by a secondary column, the maximal value winning (the transaction-time column). TieBreak::Error makes a true duplicate a loud AsofError::AmbiguousMatch rather than a silent, non-deterministic pick. With a tie-break in force the output is bit-reproducible.

The temporal key must be totally ordered

The temporal key on each side must be a totally-ordered Arrow type — any Timestamp(..), Date32/Date64, or a signed/unsigned integer. A float key is rejected: NaN has no total order, so “most recent at or before” would be undefined. The two sides’ temporal keys must share a type, and a null temporal value is never ordered — a null-time spine row is preserved with null facts, and a null-time fact is never a candidate.

One verb, many shapes

The same asof_join assembles a leakage-free labelled set keyed on past instants, matches each transaction to the value in effect when it occurred, and pairs a measurement with the reading valid at the time it was taken. The engine provides the as-of relational primitive and the determinism contract; what a caller assembles on top of it is theirs.

Enrich Results with Joins and Annotations

Search results can be enriched by joining with other data sources and annotating with additional model inference.

There are two surfaces for this, and they are deliberately different:

search is the bounded, jammi-defined primitive: nearest-neighbor top-k with optional filter/select, returning a table directly. Same call, same shape, embedded or remote.
Compound query — open, caller-shaped composition (join / filter / select and model inference over the results) — rides SQL. In Rust the fluent QueryBuilder (returned by session.search(...)) builds the same plan in-process; in Python and over the wire the surface is db.sql(...), where the annotate(...) table function runs a model over a relation. Both descend through the one inference operator, so an in-process query and a Flight-SQL query run the same plan node.

The fluent Rust builder tracks every enrichment step in the evidence provenance columns (retrieved_by / annotated_by).

Join with another source

Join search results with a registered source to add context columns (e.g., company name, category labels):

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use std::sync::Arc;
use jammi_ai::session::InferenceSession;
use jammi_db::source::{FileFormat, SourceConnection, SourceType};
async fn ex(session: &Arc<InferenceSession>, query: Vec<f32>) -> jammi_db::error::Result<()> {
session.add_source("assignees", SourceType::File, SourceConnection {
    url: Some("file:///data/assignees.csv".into()),
    format: Some(FileFormat::Csv),
    ..Default::default()
}).await?;

let results = session.search("patents", query, 10, None, None).await?
    .join("assignees", "assignee_id=id", None).await?  // left join by default
    .run().await?;
// Results now include company_name, country from assignees
Ok(()) }
}

Python

In Python the compound query is SQL. search returns a pyarrow.Table directly; to join, run SQL that the engine plans (in-process for the embed wheel, over Flight SQL for a remote engine — same SQL either way):

db.add_source("assignees", path="/data/assignees.csv", format="csv")

results = db.sql("""
    SELECT p.title, a.company_name, a.country
    FROM patents.public.patents AS p
    JOIN assignees.public.assignees AS a ON p.assignee_id = a.id
""")
# Results now include company_name, country from assignees

In Rust, the fluent builder’s on parameter is "left_col=right_col" and the optional join type is "inner" or "left" (default).

Annotate with model inference

Run a model over search results to add new columns:

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use std::sync::Arc;
use jammi_ai::session::InferenceSession;
use jammi_db::ModelTask;
async fn ex(session: &Arc<InferenceSession>, query: Vec<f32>) -> jammi_db::error::Result<()> {
let results = session.search("patents", query, 10, None, None).await?
    .annotate(
        "sentence-transformers/all-MiniLM-L6-v2",
        ModelTask::TextEmbedding,
        &["abstract".to_string()],
    ).await?
    .run().await?;
Ok(()) }
}

Python

annotate(model, task, relation, key_column, content_column, …) is a SQL table function: it runs the model over the named relation’s columns and returns the inference output (_row_id keyed from key_column, plus the task’s columns — e.g. vector). Join it back to the source on _row_id to enrich:

results = db.sql("""
    SELECT p.title, a.vector
    FROM annotate('sentence-transformers/all-MiniLM-L6-v2', 'text_embedding',
                  'patents.public.patents', 'id', 'abstract') AS a
    JOIN patents.public.patents AS p ON a._row_id = arrow_cast(p.id, 'Utf8')
""")

The same SQL — the same annotate function — runs in-process (embed wheel) or over the Flight SQL lane (remote engine, jammi-client), so compound retrieval + inference is one round-trip.

Evidence provenance

Every search result carries provenance tracking that records how each row was found and enriched:

Scenario	`retrieved_by`	`annotated_by`
Plain search	`["vector"]`	`[]`
Search + annotate	`["vector"]`	`["inference"]`

These are List<Utf8> columns — each row has its own list of contributing channels.

Composing everything

All operations compose freely:

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use std::sync::Arc;
use jammi_ai::session::InferenceSession;
use jammi_db::ModelTask;
async fn ex(session: &Arc<InferenceSession>, query: Vec<f32>) -> jammi_db::error::Result<()> {
let results = session.search("patents", query, 100, None, None).await?
    .join("assignees", "assignee_id=id", None).await?
    .annotate("all-MiniLM-L6-v2", ModelTask::TextEmbedding, &["abstract".into()]).await?
    .filter("country = 'US'")?
    .sort("similarity", true)?
    .limit(10)
    .select(&["title".into(), "company_name".into(), "similarity".into()])?
    .run().await?;
Ok(()) }
}

Python

The compound query is SQL, so everything composes as JOIN / WHERE / ORDER BY / LIMIT / projection over annotate(...) and registered sources:

results = db.sql("""
    SELECT p.title, a.company_name, ann.vector
    FROM annotate('all-MiniLM-L6-v2', 'text_embedding',
                  'patents.public.patents', 'id', 'abstract') AS ann
    JOIN patents.public.patents  AS p ON ann._row_id = arrow_cast(p.id, 'Utf8')
    JOIN assignees.public.assignees AS a ON p.assignee_id = a.id
    WHERE a.country = 'US'
    LIMIT 10
""")

Each surface plans a DataFusion execution plan under the hood. No data is processed until the result is read.

Compound Retrieval and Inference over Flight SQL

search is the bounded primitive — nearest-neighbor top-k, returning a table directly. Compound query — joining sources, filtering, and running a model over a relation — is open, caller-shaped composition, so it rides SQL. The same SQL runs in-process on the embedded engine and over the Flight SQL lane against a remote engine; the annotate(...) table function makes model inference available inside that SQL on both.

This is what lets a remote caller do search → join → annotate in one round-trip, with the model running inside the engine — no per-row RPC, no bespoke compound-search verb.

The `annotate` table function

annotate(model, task, relation, key_column, content_column [, content_column…])

It runs model (a local:<path>, an HF repo id, or a fine-tuned id) for task over the named relation’s content_column(s), and returns the inference output: the prefix _row_id / _source / _model / _status / _error / _latency_ms — with _row_id carried from key_column — followed by the task’s columns (e.g. a vector FixedSizeList for an embedding task). Join it back to the source on _row_id to place inference columns alongside source columns.

Remote: `jammi` over Flight SQL

import jammi

db = jammi.connect("grpc://engine.internal:8081")

# Compound retrieval + inference in one Flight SQL round-trip:
table = db.sql("""
    SELECT p.title, a.vector
    FROM annotate('sentence-transformers/all-MiniLM-L6-v2', 'text_embedding',
                  'patents.public.patents', 'id', 'abstract') AS a
    JOIN patents.public.patents AS p ON a._row_id = arrow_cast(p.id, 'Utf8')
    WHERE p.year >= 2020
""")
# table is a pyarrow.Table

db.sql carries the connection’s tenant scope (the same jammi-session-id the typed gRPC verbs use), so SQL reads observe the same tenant as db.search.

Embedded: the same SQL, in-process

The embed wheel runs the identical SQL against its in-process DataFusion engine — the annotate function is registered on the same context:

import jammi

db = jammi.connect("file:///var/lib/jammi")
table = db.sql("""
    SELECT a._row_id, a.vector
    FROM annotate('local:/models/all-MiniLM-L6-v2', 'text_embedding',
                  'patents.public.patents', 'id', 'abstract') AS a
""")

Productionising from the embed wheel to the remote client changes only the target (file:// → grpc://) — the import jammi and the sql call are unchanged.

In-process Rust: the fluent builder

In Rust, session.search(source_id, vec, k, embedding_table, oversample) returns a QueryBuilder that composes the same operations as a fluent chain (the annotate node it builds is the very plan node the SQL table function builds):

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use std::sync::Arc;
use jammi_ai::session::InferenceSession;
use jammi_db::ModelTask;
async fn ex(session: &Arc<InferenceSession>, query: Vec<f32>) -> jammi_db::error::Result<()> {
let results = session.search("patents", query, 10, None, None).await?
    .annotate("local:/models/all-MiniLM-L6-v2", ModelTask::TextEmbedding, &["abstract".into()]).await?
    .run().await?;
Ok(()) }
}

Notes

A WHERE over the annotated output runs above the inference node — the table function declares inference non-pushdown, since a model runs row-wise and a predicate can’t push below it. Filter the source (inside the relation a join scans) when you want to shrink the input the model sees.
The output schema is fixed at planning time; the embedding dimension is read by loading the model, which is then warm for execution.
Classification and NER ride the same prefix + task-column shape; pass their task string ('classification', 'ner') and the content column.

Declare a Custom Provenance Channel

Measured companion: for the long-form, executed-and-measured Python treatment, see The Cookbook → Provenance Channels.

Every row that flows through Jammi carries provenance — retrieved_by and annotated_by lists that record how the row was found and what was added after retrieval. Jammi ships two built-in channels — vector (declares similarity) and inference (declares inference_model, inference_task, inference_confidence) — but the catalog accepts any channel a consumer wants to register. Each channel declares the columns it contributes; the engine merges those columns into every result RecordBatch at query time.

This recipe walks through registering a third channel, scored_by, for a multi-stage retrieval pipeline where a federated reranker rescores the vector hits. The same shape applies to any non-built-in provenance signal: a citation graph, an attribution chain, a quality-grading pass.

Setup

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate arrow;
extern crate tokio;
use jammi_db::config::JammiConfig;
async fn ex(config: JammiConfig) -> jammi_db::error::Result<()> {
use std::sync::Arc;
use arrow::array::{ArrayRef, Float32Array, StringArray};
use jammi_ai::evidence::{merge_channels, ChannelContribution};
use jammi_ai::session::InferenceSession;
use jammi_db::catalog::channel_repo::{ChannelColumn, ChannelColumnType, ChannelSpec};
use jammi_db::ChannelId;
use jammi_db::source::{FileFormat, SourceConnection, SourceType};

let session = Arc::new(InferenceSession::new(config).await?);
session.add_source("patents", SourceType::File, SourceConnection {
    url: Some("file:///data/patents.parquet".into()),
    format: Some(FileFormat::Parquet),
    ..Default::default()
}).await?;
Ok(()) }
}

Python

import jammi

db = jammi.connect("file:///var/lib/jammi")
db.add_source("patents", path="/data/patents.parquet", format="parquet")

Declare the channel

Channel declarations are catalog rows. Each declared column has a name, an Arrow type, and an ordinal. The set is append-only — once ranker: Utf8 is declared, the engine refuses to redeclare it as Int32 or drop it.

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use std::sync::Arc;
use jammi_ai::session::InferenceSession;
use jammi_db::catalog::channel_repo::{ChannelColumn, ChannelColumnType, ChannelSpec};
use jammi_db::ChannelId;
async fn ex(session: &Arc<InferenceSession>) -> jammi_db::error::Result<()> {
session.catalog().channels().register(&ChannelSpec {
    id: ChannelId::new("scored_by")?,
    priority: 3,
    columns: vec![
        ChannelColumn {
            name: "ranker".into(),
            data_type: ChannelColumnType::Utf8,
        },
        ChannelColumn {
            name: "rank_score".into(),
            data_type: ChannelColumnType::Float32,
        },
    ],
}).await?;
Ok(()) }
}

Python

db.register_channel(
    "scored_by",
    priority=3,
    columns=[("ranker", "Utf8"), ("rank_score", "Float32")],
)

priority controls the order columns appear in the merged output — vector (1) and inference (2) come first, then scored_by (3).

To add more columns to an already-registered channel:

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use std::sync::Arc;
use jammi_ai::session::InferenceSession;
use jammi_db::catalog::channel_repo::{ChannelColumn, ChannelColumnType};
use jammi_db::ChannelId;
async fn ex(session: &Arc<InferenceSession>) -> jammi_db::error::Result<()> {
session.catalog().channels().add_columns(
    &ChannelId::new("scored_by")?,
    &[ChannelColumn { name: "scored_at".into(), data_type: ChannelColumnType::Utf8 }],
).await?;
Ok(()) }
}

db.add_channel_columns("scored_by", columns=[("scored_at", "Utf8")])

add_columns is append-only by construction. Trying to redeclare ranker with the same or a different type returns JammiError::ChannelCatalog(_).

Use the channel

Build a ChannelContribution for each batch your reranker produces. The arrays must align 1:1 with the channel’s declared columns (ranker first, rank_score second) and have the same length as the batch’s row count.

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate arrow;
extern crate tokio;
use std::sync::Arc;
use arrow::array::{ArrayRef, Float32Array, RecordBatch, StringArray};
use jammi_ai::evidence::{merge_channels, ChannelContribution};
use jammi_ai::session::InferenceSession;
use jammi_db::ChannelId;
fn rerank_scores(_batch: &RecordBatch) -> Vec<f32> { vec![] }
async fn ex(session: &Arc<InferenceSession>, batches: Vec<RecordBatch>) -> jammi_db::error::Result<()> {
let scored_by = ChannelId::new("scored_by")?;
let vector = ChannelId::new("vector")?;

let mut contributions = Vec::with_capacity(batches.len());
for batch in &batches {
    let n = batch.num_rows();
    let ranker: ArrayRef = Arc::new(StringArray::from(vec!["bm25"; n]));
    let rank_score: ArrayRef = Arc::new(Float32Array::from(rerank_scores(batch)));
    contributions.push(vec![ChannelContribution {
        channel: scored_by.clone(),
        columns: vec![ranker, rank_score],
    }]);
}

let merged = merge_channels(
    session.catalog(),
    &batches,
    &[vector.clone(), scored_by.clone()],
    &[vector, scored_by],   // retrieved_by
    &[],                     // annotated_by
    &contributions,
).await?;
Ok(()) }
}

Verify

The merged output schema includes the declared columns. Rows where the channel did not supply a value carry NULL.

Rust

#![allow(unused)]
fn main() {
extern crate arrow;
use arrow::array::RecordBatch;
fn ex(merged: Vec<RecordBatch>) {
let schema = merged[0].schema();
assert!(schema.field_with_name("ranker").is_ok());
assert!(schema.field_with_name("rank_score").is_ok());

for batch in &merged {
    let ranker = batch.column_by_name("ranker").unwrap();
    println!("first ranker: {:?}", ranker);
}
}
}

Python

From the SQL surface, the declared columns show up in any query that touches the result table — Python sees them as plain Arrow columns:

table = db.sql(
    "SELECT _row_id, similarity, ranker, rank_score FROM results LIMIT 3"
)
for row in table.to_pylist():
    print(row["ranker"], row["rank_score"])

What you cannot do

The channel declaration is append-only. Once scored_by ships with ranker: Utf8, you cannot:

Redeclare ranker as Int32 — add_columns rejects with JammiError::ChannelCatalog(ChannelCatalogError::ColumnConflict { … }), whose message is "channel 'scored_by': column 'ranker' was declared Utf8, cannot redeclare as Int32". From Python, the same call raises RuntimeError carrying the identical message:
```
db.add_channel_columns("scored_by", columns=[("ranker", "Int32")])
# RuntimeError: channel 'scored_by': column 'ranker' was declared Utf8, cannot redeclare as Int32
```
Add a second column with the same name — add_columns rejects with JammiError::ChannelCatalog(ChannelCatalogError::ColumnAlreadyDeclared { … }), whose message is "channel 'scored_by': column 'ranker' already declared".
Drop ranker from the channel — there is no drop_column method by design.

If a column needs to change shape, declare a new column under a new name and migrate consumers. This preserves byte-for-byte readability of any backing table or downstream artifact that already references the original column.

Fine-Tune for Your Domain

Measured companion: for the long-form, executed-and-measured Python treatment, see The Cookbook → Fine-Tuning Methods.

Train LoRA adapters on your data to improve embedding quality for your domain. The base model stays frozen — only a small projection layer is trained and saved.

Prepare training data

Create contrastive pairs with a similarity score:

text_a,text_b,score
"quantum error correction","superconducting qubit stabilization",0.88
"quantum error correction","medieval poetry analysis",0.08

High scores mean similar; low scores mean dissimilar.

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use jammi_ai::session::InferenceSession;
use jammi_db::source::{FileFormat, SourceConnection, SourceType};
async fn ex(session: &InferenceSession) -> jammi_db::error::Result<()> {
session.add_source("training", SourceType::File, SourceConnection {
    url: Some("file:///data/training_pairs.csv".into()),
    format: Some(FileFormat::Csv),
    ..Default::default()
}).await?;
Ok(()) }
}

Python

db.add_source("training", path="/data/training_pairs.csv", format="csv")

Start a fine-tuning job

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use jammi_ai::session::InferenceSession;
async fn ex(session: &InferenceSession) -> jammi_db::error::Result<()> {
use jammi_ai::fine_tune::FineTuneMethod;
use jammi_db::ModelTask;

let job = session.fine_tune(
    "training",
    "sentence-transformers/all-MiniLM-L6-v2",
    &["text_a".into(), "text_b".into(), "score".into()],
    FineTuneMethod::Lora,
    ModelTask::TextEmbedding,
    None,  // default config
).await?;

println!("Job: {}", job.job_id);
job.wait().await?;
println!("Model: {}", job.model_id());
Ok(()) }
}

Python

job = db.fine_tune(
    source="training",
    base_model="sentence-transformers/all-MiniLM-L6-v2",
    columns=["text_a", "text_b", "score"],
    method="lora",
    task="embedding",
)

job.wait()
print(f"Model: {job.model_id}")

Custom configuration

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use jammi_ai::session::InferenceSession;
use jammi_ai::fine_tune::{FineTuneMethod, LrSchedule};
async fn ex(session: &InferenceSession, model: &str, columns: Vec<String>) -> jammi_db::error::Result<()> {
use jammi_ai::fine_tune::FineTuneConfig;
use jammi_db::ModelTask;

let config = FineTuneConfig {
    lora_rank: 4,
    learning_rate: 5e-4,
    epochs: 5,
    batch_size: 4,
    warmup_steps: 10,
    lr_schedule: LrSchedule::CosineDecay,
    early_stopping_patience: 2,
    validation_fraction: 0.2,
    gradient_accumulation_steps: 4,  // effective batch = 4 x 4 = 16
    ..Default::default()
};

let job = session.fine_tune(
    "training", model, &columns, FineTuneMethod::Lora, ModelTask::TextEmbedding, Some(config),
).await?;
Ok(()) }
}

Configuration reference

Field	Default	Description
`lora_rank`	8	Low-rank dimension
`lora_alpha`	16.0	Scaling factor
`lora_dropout`	0.05	Dropout probability
`learning_rate`	2e-4	Base learning rate
`epochs`	3	Training epochs
`batch_size`	8	Micro-batch size
`max_seq_length`	512	Max tokens per text
`gradient_accumulation_steps`	1	Steps before optimizer update
`validation_fraction`	0.1	Holdout fraction for early stopping
`early_stopping_patience`	3	Epochs without improvement before stopping
`warmup_steps`	100	Linear warmup from 0 to base LR
`lr_schedule`	CosineDecay	Decay after warmup: Constant, CosineDecay, LinearDecay
`embedding_loss`	auto	CoSent (pairs+scores), Triplet, MultipleNegativesRanking

Use the fine-tuned model

The fine-tuned model is automatically registered and can be used anywhere a model ID is accepted:

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use jammi_ai::session::InferenceSession;
use jammi_ai::fine_tune::training_job::TrainingJob;
use jammi_db::store::CachePolicy;
async fn ex(session: &InferenceSession, job: &TrainingJob) -> jammi_db::error::Result<()> {
let model_id = job.model_id();

let embedding = session.encode_text_query(model_id, "quantum computing").await?;
println!("query embedding has {} dims", embedding.len());
session.generate_text_embeddings("patents", model_id, &["abstract".into()], "id", CachePolicy::Bypass).await?;
Ok(()) }
}

Python

model_id = job.model_id

query_vec = db.encode_query(model=model_id, query="quantum computing")
db.generate_embeddings(source="patents", model=model_id, columns=["abstract"], key="id", modality="text")

How it works

text -> encoder (frozen) -> base embedding -> LoRA projection (trained) -> output

The base encoder model (BERT, ModernBERT, etc.) is loaded and frozen
A LoRA projection layer (identity + low-rank A/B matrices) is added after pooling
For each batch: text is encoded, projected through LoRA, and loss is computed
Only the A/B matrices receive gradients
The adapter is saved as adapter.safetensors in the artifact directory

Encoder-adapters fine-tuning (PEFT-style adapter injection)

The default flow above trains a single low-rank projection head sitting outside the frozen encoder. For higher capacity at the same parameter budget, Jammi also supports encoder adapters — LoRA injected into named linear layers inside the encoder stack, matching the PEFT convention.

Switch to encoder adapters by populating target_modules on FineTuneConfig:

#![allow(unused)]
fn main() {
extern crate jammi_ai;
use jammi_ai::fine_tune::FineTuneConfig;
fn make() -> FineTuneConfig {
let config = FineTuneConfig {
    lora_rank: 8,
    lora_alpha: 16.0,
    // Inject LoRA into BERT's attention query and value projections.
    target_modules: vec!["query".to_string(), "value".to_string()],
    ..Default::default()
};
config }
}

job = db.fine_tune(
    source="training",
    base_model="sentence-transformers/all-MiniLM-L6-v2",
    columns=["text_a", "text_b", "score"],
    method="lora",
    task="text_embedding",
    target_modules=["query", "value"],
)

Target-module conventions

Pick target_modules per the architecture you’re fine-tuning:

Architecture	Common target_modules
BERT / RoBERTa / CamemBERT / XLM-RoBERTa	`["query", "value"]` (recommended) or `["query", "key", "value", "dense"]`
DistilBERT	`["q_lin", "v_lin"]` or `["q_lin", "k_lin", "v_lin", "out_lin"]`
ModernBERT	`["Wqkv", "Wo"]` (fused QKV + output)
Any encoder	`["all-linear"]` — every linear layer gets an adapter (largest capacity)

Names match the trailing module-name segment in the HuggingFace weight layout. Suffix matching is the rule, so "query" matches "attention.self.query".

Layer ranges and per-module ranks

Two optional refinements:

layers_to_transform — restrict injection to specific 0-based layer indices. None (default) applies to every layer.
rank_pattern — override lora_rank for individual modules. Keys are substring matches against the module name; values are the override rank.

#![allow(unused)]
fn main() {
extern crate jammi_ai;
use jammi_ai::fine_tune::FineTuneConfig;
fn make() -> FineTuneConfig {
let mut rank_pattern = std::collections::HashMap::new();
rank_pattern.insert("query".to_string(), 16);  // higher capacity on Q
rank_pattern.insert("value".to_string(), 4);   // lower on V

let config = FineTuneConfig {
    lora_rank: 8,                                     // default rank
    target_modules: vec!["query".into(), "value".into()],
    layers_to_transform: Some(vec![6, 7, 8, 9, 10, 11]), // top half only
    rank_pattern,
    ..Default::default()
};
config }
}

On-disk artifact

Every fine-tuned model writes adapter.safetensors plus an adapter_config.json whose adapter_type tag discriminates between the two adapter shapes Jammi produces.

Encoder-adapters example:

{
  "adapter_type": "encoder_adapters",
  "model_type": "bert",
  "lora_rank": 8,
  "lora_alpha": 16.0,
  "use_rslora": false,
  "target_modules": ["query", "value"],
  "layers_to_transform": [6, 7, 8, 9, 10, 11],
  "rank_pattern": {"query": 16, "value": 4},
  "backbone_dtype": "f32"
}

Projection-head example:

{
  "adapter_type": "projection_head",
  "lora_rank": 8,
  "lora_alpha": 16.0,
  "head_layers": ["projection"]
}

The Candle inference backend reads adapter_config.json on model load and dispatches on adapter_type: encoder_adapters rebuilds the encoder with frozen backbone weights plus the LoRA A/B from adapter.safetensors; projection_head loads the saved projection weights as a LoraLinear applied after pooling.

When to use each

Projection head — fastest training, smallest artifact, lowest memory. The default when target_modules is empty. Best for adapting embedding direction without changing per-token attention behaviour.
Encoder adapters — higher representational ceiling per adapter parameter; required if the task needs to reshape attention behaviour (e.g. a domain where the base attention pattern mismatches the query distribution). Costs a slightly slower forward pass since the LoRA path runs per layer.

Training safety

Divergence detection: if loss is NaN or >100 for 3 consecutive batches, the job fails with a clear error
Early stopping: training stops when validation loss doesn’t improve for patience epochs, best checkpoint weights are restored
Checkpoints: saved at ~10% intervals for crash recovery

Fine-Tune from a Graph (Graph-Supervised)

Measured companion: for the long-form, executed-and-measured Python treatment, see The Cookbook → Representation Learning on Graphs.

Fine-tune embeddings so that graph-neighbours are close in embedding space. This is node2vec / DeepWalk realised as Jammi config: it samples a graph into contrastive (anchor, positive, [hard_negative]) pairs and feeds them through the existing fine-tune trainer. It authors no GNN — no message passing, no new loss — it is a new training-data shape (TrainingFormat::Graph) that drives the same in-batch-negative (MNRL) / triplet objective as Fine-Tune for Your Domain.

Use it when your supervision is a graph rather than hand-built pairs: a hierarchy, a crosswalk, a citation network, a set of coder-confirmed matches, or the neighbour graph Jammi itself builds.

The load-bearing caveat: where the signal comes from

Declared edges teach; similarity edges echo.

If you train on S9-similarity edges (the neighbour graph, which is k-NN under the base embedding metric), the walk-positives are mostly “things the model already thinks are close” — so fine-tuning largely re-learns the base metric. That is a degenerate feedback loop with little new signal.

Genuine gain comes from declared / external edges — structure the base metric does not already encode:

a hierarchy (parent/child categories),
a crosswalk (version-A code ↔ version-B code),
a citation / reference network,
coder-confirmed pairs.

Tag your edges with their provenance. Similarity edges are an acceptable weak bootstrap (e.g. to expand a sparse declared graph), but never the sole supervision. The sampler tracks provenance and can report whether any declared edge is present.

Prepare the graph

Two sources: node text (what the encoder embeds) and edges.

nodes.csv — every node must be text-bearing (the encoder needs text; pure-vector nodes are out of scope here):

id,text
c01,"acute myocardial infarction, initial"
c02,"acute myocardial infarction, subsequent"
c03,"benign essential hypertension"

edges.csv — directed edges; endpoints join to id:

src,dst
c01,c02
c02,c01

Python

db.add_source("nodes", path="/data/nodes.csv", format="csv")
db.add_source("edges", path="/data/edges.csv", format="csv")

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
use jammi_ai::session::InferenceSession;
use jammi_db::source::{FileFormat, SourceConnection, SourceType};
async fn ex(session: &InferenceSession) -> jammi_db::error::Result<()> {
for name in ["nodes", "edges"] {
    session.add_source(name, SourceType::File, SourceConnection {
        url: Some(format!("file:///data/{name}.csv")),
        format: Some(FileFormat::Csv),
        ..Default::default()
    }).await?;
}
Ok(()) }
}

Run the graph fine-tune

The sampler runs biased random walks (node2vec) over the edges: from each node it walks walk_length (L) steps, biased by the return parameter p and the in-out parameter q, and treats co-walked nodes as positives. L > 1 captures higher-order / community structure — L = 1 is the degenerate 1-hop case. Negatives are in-batch (every other pair’s positive) plus structure-mined hard negatives drawn from outside the anchor’s exclude_hops-hop neighbourhood (the false-negative guard — a node inside that radius is likely a missing edge, i.e. a true positive).

Python

job = db.fine_tune_graph(
    node_source="nodes", id_column="id", text_column="text",
    edge_source="edges", src_column="src", dst_column="dst",
    base_model="local:/models/tiny_bert",
    edge_provenance="declared",   # "declared" teaches; "similarity" echoes
    walk_length=4, walks_per_node=2, return_p=1.0, in_out_q=1.0,
    graph_hard_negatives=1, exclude_hops=1, min_negatives=1,
    embedding_loss="mnrl",        # in-batch negatives (default); or "triplet"
    epochs=3, batch_size=8,
)
job.wait()

Rust

#![allow(unused)]
fn main() {
extern crate jammi_ai;
extern crate jammi_db;
use jammi_ai::session::InferenceSession;
use jammi_ai::fine_tune::FineTuneConfig;
use jammi_ai::fine_tune::graph_sampler::{EdgeProvenance, GraphFineTuneSources, GraphSampleConfig};
async fn ex(session: &InferenceSession) -> jammi_db::error::Result<()> {
let sources = GraphFineTuneSources {
    node_source: "nodes".into(), id_column: "id".into(), text_column: "text".into(),
    edge_source: "edges".into(), src_column: "src".into(), dst_column: "dst".into(),
    // Declared edges carry signal the base metric does not already encode.
    provenance: EdgeProvenance::Declared,
};
let sample = GraphSampleConfig {
    walk_length: 4, walks_per_node: 2, return_p: 1.0, in_out_q: 1.0,
    hard_negatives: 1, exclude_hops: 1, min_negatives: 1, seed: 0,
};
let job = session
    .fine_tune_graph(&sources, "local:/models/tiny_bert", sample, Some(FineTuneConfig::default()))
    .await?;
job.wait().await?;
Ok(()) }
}

The output is a fine-tuned model; regenerate embeddings with it and they encode the graph’s structure (build_neighbor_graph, search, and propagation all benefit).

Tuning knobs

Knob	Effect
`walk_length` (`L`)	How far a positive can be. `1` = 1-hop only; `>1` = community structure.
`return_p` (`p`)	Large `p` discourages backtracking.
`in_out_q` (`q`)	`q < 1` explores outward (DFS-like); `q > 1` stays local (BFS-like).
`graph_hard_negatives`	Structure-mined hard negatives per pair. `0` = in-batch only.
`exclude_hops`	Hops of the anchor’s neighbourhood excluded from its negatives (false-negative guard).
`min_negatives`	Minimum negative pool — guards against contrastive collapse on a tiny graph.

Compose with propagation

Both graph fine-tune and embedding propagation encode homophily; stacking them naively double-counts the same smoothing. The recommended order is propagate first, then fine-tune the head (the SGC/APPNP decoupling) — not two independent smoothing passes.

Did it work? The circularity check

To confirm declared edges actually helped (and that you did not just re-learn the base metric), evaluate on a held-out golden set — see Did Structure Help? A Graph-ML Evaluation Recipe:

Build two supervision graphs over the same nodes — one from declared edges, one from S9-similarity edges.
fine_tune_graph each; hold out a golden relevance set.
eval_embeddings the base model vs each fine-tune, with a paired significance test.
Expect the declared-edge model to beat the base significantly, and the similarity-edge model’s gain to be near-zero — the degenerate feedback loop, measured rather than assumed.

Evaluate and Compare Models

Measured companion: for the long-form, executed-and-measured Python treatment, see The Cookbook → Evaluation & Provenance Channels.

Measure embedding quality and classification accuracy against golden datasets. Results are recorded in the catalog for tracking over time.

Prepare a golden dataset

A golden dataset is any registered source with the right columns. No special format required.

Retrieval golden set

query_id,query_text,relevant_id
q1,quantum computing applications,1
q1,quantum computing applications,4
q2,machine learning for science,2

Column	Type	Required
`query_id`	Utf8	yes
`query_text`	Utf8	yes
`relevant_id`	Utf8 or Int	yes
`relevance_grade`	Int32	no (default: 1 = binary)

db.add_source("golden", path="/data/golden_relevance.csv", format="csv")

Evaluate embedding quality

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use jammi_ai::session::InferenceSession;
async fn ex(session: &InferenceSession) -> jammi_db::error::Result<()> {
let report = session.eval_embeddings(
    "patents",
    None,                                // use latest embedding table
    "golden.public.golden_relevance",    // golden dataset
    10,                                  // k for recall@k, precision@k
    &std::collections::HashMap::new(),   // no cohort tags
).await?;

println!("recall@10:    {}", report.aggregate.recall_at_k);
println!("precision@10: {}", report.aggregate.precision_at_k);
println!("MRR:          {}", report.aggregate.mrr);
println!("nDCG:         {}", report.aggregate.ndcg);
Ok(()) }
}

Python

metrics = db.eval_embeddings(
    source="patents",
    golden_source="golden.public.golden_relevance",
    k=10,
)

agg = metrics["aggregate"]
print(f"recall@10:    {agg['recall_at_k']:.3f}")
print(f"precision@10: {agg['precision_at_k']:.3f}")
print(f"MRR:          {agg['mrr']:.3f}")
print(f"nDCG:         {agg['ndcg']:.3f}")

Per-query drill-down

The report also carries a per_query array — one record per golden-set query, in golden order. This is what sample-based statistical rules (Welch’s t, Mann-Whitney U) consume at gate time.

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use jammi_ai::session::InferenceSession;
async fn ex(session: &InferenceSession) -> jammi_db::error::Result<()> {
let report = session.eval_embeddings("patents", None, "golden.public.golden_relevance", 10, &std::collections::HashMap::new()).await?;
for record in &report.per_query {
    println!("{}: recall={:.3} ndcg={:.3}",
        record.query_id, record.metrics.recall, record.metrics.ndcg);
}
Ok(()) }
}

for record in metrics["per_query"]:
    m = record["metrics"]
    print(f"{record['query_id']}: recall={m['recall']:.3f} ndcg={m['ndcg']:.3f}")

Retrieval metrics

Metric	What it measures
`recall_at_k`	Fraction of relevant docs found in top-k
`precision_at_k`	Fraction of top-k that are relevant
`mrr`	Reciprocal rank of the first relevant result
`ndcg`	Normalized discounted cumulative gain (uses graded relevance if provided)

All metrics are in [0, 1]. Higher is better.

Compare models (A/B)

Compare a base model against a fine-tuned model:

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use jammi_ai::session::InferenceSession;
async fn ex(session: &InferenceSession, base_table: String, finetuned_table: String) -> jammi_db::error::Result<()> {
let comparison = session.eval_compare(
    &[base_table.clone(), finetuned_table.clone()],
    "patents",
    "golden.public.golden_relevance",
    10,
).await?;

// The first entry is the baseline (`delta: None`); every subsequent entry
// carries a delta against it.
for entry in comparison.per_table.iter().skip(1) {
    let delta = entry.delta.as_ref().expect("non-baseline entries carry a delta");
    println!(
        "{}: recall@10 delta {:+.3} ({:+.1}%)",
        entry.table_name,
        delta.recall_at_k.absolute,
        delta.recall_at_k.relative * 100.0,
    );
}
Ok(()) }
}

Python

comparison = db.eval_compare(
    embedding_tables=[base_table, finetuned_table],
    source="patents",
    golden_source="golden.public.golden_relevance",
    k=10,
)
# `per_table[0]` is the baseline (`delta` is None); subsequent entries
# carry a `delta` dict keyed by metric name (recall_at_k, precision_at_k,
# mrr, ndcg) with `absolute` and `relative` sub-keys.
for entry in comparison["per_table"][1:]:
    d = entry["delta"]["recall_at_k"]
    print(f"{entry['table_name']}: recall@10 delta {d['absolute']:+.3f} ({d['relative']*100:+.1f}%)")

The first table is the baseline. Deltas (absolute and relative) are computed for all subsequent tables.

Evaluate classification

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use jammi_ai::session::InferenceSession;
async fn ex(session: &InferenceSession) -> jammi_db::error::Result<()> {
use jammi_ai::eval::{EvalTask, InferenceAggregate};

let report = session.eval_inference(
    "facebook/bart-large-mnli",
    "test_data",
    &["text".into()],
    EvalTask::Classification,
    "golden.public.labels",
    "category",
).await?;

match &report.aggregate {
    InferenceAggregate::Classification(c) => {
        println!("Accuracy: {}", c.accuracy);
        println!("Macro F1: {}", c.f1);
    }
    InferenceAggregate::Ner(n) => {
        println!("NER F1: {}", n.f1);
    }
}
println!("per_record predictions: {}", report.per_record.len());
Ok(()) }
}

Python

metrics = db.eval_inference(
    model="facebook/bart-large-mnli",
    source="test_data",
    columns=["text"],
    task="classification",
    golden_source="golden.public.labels",
    label_column="category",
)

# `aggregate` is tagged by `task`; for classification it carries
# `accuracy`, `f1`, and `per_class`.
agg = metrics["aggregate"]
print(f"Accuracy: {agg['accuracy']:.3f}")
print(f"Macro F1: {agg['f1']:.3f}")
# `per_record` is one entry per aligned predicted/gold pair.
print(f"per_record predictions: {len(metrics['per_record'])}")

Eval runs in the catalog

Every evaluation is recorded automatically:

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use jammi_ai::session::InferenceSession;
async fn ex(session: &InferenceSession) -> jammi_db::error::Result<()> {
let runs = session.catalog().list_eval_runs().await?;
for run in &runs {
    println!("{}: {} on {} (k={:?})", run.eval_run_id, run.eval_type, run.golden_source, run.k);
}
Ok(()) }
}

Schema validation

Golden datasets are validated before evaluation starts. Missing or wrong-type columns produce clear errors:

Eval error: Golden dataset missing required column 'query_text'
Eval error: Golden dataset column 'query_id' has type Boolean, expected Utf8

Integer ID columns (Int32, Int64) are accepted where Utf8 is expected.

Did Structure Help? A Graph-ML Evaluation Recipe

When you produce a structure-aware embedding table — a fine-tuned model, a propagated table, or any treatment that folds graph context into the representation — the only question that matters is whether it beats plain text embeddings on a held-out retrieval task. This recipe is the discipline around eval_compare that turns a delta into a defensible conclusion.

Nothing here is new engine surface. eval_compare already computes recall@k / precision@k / MRR / nDCG per table, the per-metric delta against a baseline, and — paired by query_id over the per-query records — a distribution-free significance result for each metric delta. The recipe is the protocol: a clean split, the judgment-matched metric, multiple seeds for trained treatments, and cohort slicing.

The four steps

Baseline. Produce a plain text-embedding table with generate_text_embeddings.
Treatment. Produce each structure-aware table (a fine-tuned model’s embeddings, a propagated table, etc.) over the same source rows.
Compare. Run eval_compare with the baseline table first. Read the per-metric delta and its paired significance.
Conclude and slice. Declare a win only when the judgment-matched metric improves with a significant paired test; then slice by cohort to see where structure helped.

1 & 2 — produce baseline and treatment tables

The baseline is a plain text-embedding table over your corpus. Each treatment table must be built over the same rows so the comparison is apples-to-apples.

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use jammi_ai::session::InferenceSession;
use jammi_db::store::CachePolicy;
async fn ex(session: &InferenceSession, baseline_model: &str, treatment_model: &str) -> jammi_db::error::Result<()> {
// Baseline: plain text embeddings.
let (baseline, _) = session
    .generate_text_embeddings("patents", baseline_model, &["abstract".into()], "id", CachePolicy::Bypass)
    .await?;

// Treatment: embeddings from a structure-aware model (e.g. a fine-tuned
// checkpoint), over the same source and key column.
let (treatment, _) = session
    .generate_text_embeddings("patents", treatment_model, &["abstract".into()], "id", CachePolicy::Bypass)
    .await?;

let baseline_table = baseline.table_name;
let treatment_table = treatment.table_name;
let _ = (baseline_table, treatment_table);
Ok(()) }
}

baseline = db.generate_embeddings(
    source="patents", model=baseline_model,
    columns=["abstract"], key="id", modality="text",
)
treatment = db.generate_embeddings(
    source="patents", model=treatment_model,
    columns=["abstract"], key="id", modality="text",
)

Leakage contract (read before you build anything). The graph and any graph-supervised training must use the train split only. The golden set — the (query, judgments) pairs you evaluate against — is held out and must never feed graph construction or training-pair selection. A structure-aware representation that has seen the eval rows will “win” by memorizing them, and the verdict is worthless. Split first, then build.

3 — compare on a held-out golden set

Run eval_compare with the baseline table first; every subsequent table carries its delta against that baseline. The golden set is a registered source of (query_id, query_text, relevant_id[, relevance_grade]) rows — see Evaluate and Compare Models for its schema.

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use jammi_ai::session::InferenceSession;
async fn ex(session: &InferenceSession, baseline_table: String, treatment_table: String) -> jammi_db::error::Result<()> {
let comparison = session
    .eval_compare(
        &[baseline_table, treatment_table], // baseline FIRST
        "patents",
        "golden.public.golden_relevance",   // held-out golden set
        10,                                 // k for recall@k / precision@k
    )
    .await?;

for entry in comparison.per_table.iter().skip(1) {
    let delta = entry.delta.as_ref().expect("non-baseline entries carry a delta");
    println!(
        "{}: nDCG {:+.3} ({:+.1}%)",
        entry.table_name,
        delta.ndcg.absolute,
        delta.ndcg.relative * 100.0,
    );

    // The paired significance of each metric delta. `None` only when the two
    // runs share no `query_id` (nothing to pair).
    if let Some(sig) = delta.significance.as_ref() {
        let s = &sig.ndcg;
        println!(
            "  nDCG p={:.4}  95% CI [{:+.3}, {:+.3}]",
            s.p_value, s.ci_lower, s.ci_upper,
        );
    }
    let _ = entry;
}
Ok(()) }
}

comparison = db.eval_compare(
    embedding_tables=[baseline_table, treatment_table],  # baseline FIRST
    source="patents",
    golden_source="golden.public.golden_relevance",
    k=10,
)
for entry in comparison["per_table"][1:]:
    delta = entry["delta"]
    d = delta["ndcg"]
    print(f"{entry['table_name']}: nDCG {d['absolute']:+.3f} ({d['relative']*100:+.1f}%)")
    sig = delta.get("significance")
    if sig is not None:
        s = sig["ndcg"]
        print(f"  nDCG p={s['p_value']:.4f}  95% CI [{s['ci_lower']:+.3f}, {s['ci_upper']:+.3f}]")

Reading the significance

For each metric, eval_compare attaches a MetricSignificance carrying a p_value and a [ci_lower, ci_upper] interval:

ci_lower / ci_upper are a percentile bootstrap confidence interval on the mean paired difference (treatment − baseline), at the 95% level. A CI that lies entirely above zero is the resampling analogue of “the delta is real, not noise.” A CI that brackets zero means you cannot distinguish the treatment from the baseline on this metric.
p_value is the two-tailed Mann–Whitney U p-value comparing the baseline and treatment per-query distributions — distribution-free, and robust to the bounded, tie-heavy shape retrieval metrics have. Smaller is stronger evidence.

Both are deterministic: the bootstrap runs under a pinned seed and a fixed iteration count, so the same inputs always yield the same interval. Two identical runs collapse to a [0, 0] CI with p ≈ 1.

A delta of +0.02 is a headline, not a conclusion. Report it only as +0.02, p=0.003, CI [+0.008, +0.031] — the delta with its significance.

Discipline contracts

These are the contracts a “did structure help?” claim must satisfy. State each one explicitly when you report a result.

Strict held-out split (the leakage contract)

The graph and any graph-supervised training use the train split only; the golden set is held out and never feeds construction. This is the single most important contract — restated here because it is the one that silently invalidates a result. If you cannot point to the split boundary, you do not have a clean number.

Judgment-matched metric

Pick the metric from the judgment type, not by habit:

Judgments	Metric	Why
Graded (`relevance_grade` > 1)	nDCG	Discounted cumulative gain uses the grade; recall would discard the ranking signal.
Binary (relevant / not)	recall@k, MRR	No grade to exploit; presence and first-hit rank are the right targets.

eval_compare always computes all four metrics, but read the one that matches your golden set. Using recall on graded judgments throws away signal you paid a human to produce.

≥3 seeds for trained treatments

A trained treatment (a fine-tuned checkpoint, or any treatment whose construction samples) varies by seed. One lucky seed can fake a win. Run the treatment under ≥3 seeds, compare each against the same baseline, and report the mean ± variance of the delta plus the significance across seeds — not a single run.

A deterministic treatment (e.g. a pure propagation with no sampling) does not vary by seed, so it needs only the leakage and significance discipline — a quiet advantage worth stating when it applies.

Cohort slicing

The most useful output is where structure helped — by source family, period, or any segment you care about. Tag each query with cohort labels at eval time, then group the persisted per-query records.

Cohort tags are supplied per query to eval_embeddings (the per-table entry point); eval_compare itself does not surface cohort tagging, so to slice a comparison you run each table through eval_embeddings with the same cohort map. Every per-query record — its metrics and its cohort tags — is persisted to _jammi_eval_per_query, keyed by the run’s eval_run_id, and read back with eval_per_query.

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use std::collections::{BTreeMap, HashMap};
use jammi_ai::session::InferenceSession;
async fn ex(session: &InferenceSession, treatment_table: &str) -> jammi_db::error::Result<()> {
// Tag each query with its cohort(s) at eval time.
let mut cohorts: HashMap<String, BTreeMap<String, String>> = HashMap::new();
cohorts.insert(
    "q1".into(),
    BTreeMap::from([("family".into(), "A".into())]),
);
// ... one entry per query_id ...

let report = session
    .eval_embeddings(
        "patents",
        Some(treatment_table),
        "golden.public.golden_relevance",
        10,
        &cohorts,
    )
    .await?;

// Read the persisted per-query rows back by run id; each row carries its
// metrics and cohort tags as JSON, ready to group by cohort.
let rows = session.eval_per_query(&report.eval_run_id).await?;
for row in &rows {
    println!("{}: cohorts={} metrics={}", row.query_id, row.cohorts_json, row.metrics_json);
}
Ok(()) }
}

cohorts = {"q1": {"family": "A"}, "q2": {"family": "B"}}  # one entry per query_id
report = db.eval_embeddings(
    source="patents",
    embedding_table=treatment_table,
    golden_source="golden.public.golden_relevance",
    k=10,
    cohorts=cohorts,
)
rows = db.eval_per_query(report["eval_run_id"])
# Group `rows` by their cohort tags and aggregate per-cohort metrics.

Then group by cohort and report per-cohort n and a confidence interval — a 12-query cohort has a wide CI, and a swing inside it is not a finding. Small cohorts go noisy; report n so a reader does not over-read them.

What this recipe does not cover

New metrics. Recall / precision / MRR / nDCG suffice; this recipe does not add others.
Online drift monitoring. This is an offline held-out harness, not a production-drift monitor.
The golden set itself. Constructing relevance judgments is the expensive human step — budget for it. The eval is cheap; the labels are not.

Evaluate Uncertainty and Calibration

Measured companion: for the long-form, executed-and-measured Python treatment, see The Cookbook → Calibration & Uncertainty.

eval_embeddings and eval_inference answer “is the prediction accurate?”. eval_calibration answers the orthogonal question — “does the prediction know what it doesn’t know?”. The two are independent: a model can be accurate and badly calibrated, or perfectly calibrated and useless. When a predictor emits a distribution or an interval, a point-accuracy metric cannot tell you whether that uncertainty is honest. This harness can.

What it reports

Every calibration eval reports three things together — reporting any one alone is a trap:

A proper score (the headline). CRPS (continuous ranked probability score) and NLL (negative log-likelihood). Strictly proper scores are uniquely minimised by the true distribution, so they reward calibration and sharpness jointly — the only safe headline metric.
A calibration diagnostic. The adaptive, debiased PIT-calibration error: under calibration the probability-integral-transform of the outcomes is uniform, and this scores its departure from uniform. It is a diagnostic, never the verdict — reporting it alone admits the marginal-predictor degenerate (a model that predicts the global average is perfectly calibrated and worthless).
Sharpness and coverage. The mean width of the nominal 90% interval and how often it actually contains the outcome. Sharper is better only at fixed coverage.

The held-out, three-way-split contract

Calibration is measured on a held-out test set that is disjoint from both the training data and any calibration set used to fit the predictor. Re-using calibration points to also test inflates coverage — it is the single most common conformal/calibration bug. The harness measures exactly the predictions you give it; the split discipline is yours.

Prepare a calibration golden set

A calibration golden set pairs a held-out predictive distribution with its realised outcome. Two predictor shapes are supported, each reading different columns.

Parametric (Gaussian) predictor

For a predictor that emits a predictive Normal(mean, sd) per record:

record_id,mean,sd,outcome
r1,4.2,0.5,4.0
r2,1.1,0.9,2.3
r3,7.8,0.3,7.7

Column	Type	Required
`record_id`	Utf8	yes
`mean`	Float / Int	yes
`sd`	Float / Int (positive)	yes
`outcome`	Float / Int	yes

Ensemble (Sample) predictor

For a predictor that emits an ensemble of predictive draws per record, store the draws as a JSON array in a draws column:

record_id,draws,outcome
r1,"[3.9, 4.1, 4.3, 4.0]",4.0
r2,"[0.8, 1.4, 1.0, 2.1]",2.3

Column	Type	Required
`record_id`	Utf8	yes
`draws`	Utf8 (JSON array of numbers)	yes
`outcome`	Float / Int	yes

db.add_source("calib", path="/data/calibration_holdout.csv", format="csv")

Run the eval

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
use jammi_ai::session::InferenceSession;
use jammi_ai::eval::EvalCalibrationShape;
async fn ex(session: &InferenceSession) -> jammi_db::error::Result<()> {
let report = session.eval_calibration(
    "patents",                          // source under test
    "calib.public.calibration_holdout", // held-out predictions + outcomes
    EvalCalibrationShape::Gaussian,     // or ::Sample for an ensemble
    &std::collections::HashMap::new(),  // no cohort tags
).await?;

// The proper score is the headline; the diagnostics explain it.
println!("CRPS (headline):  {}", report.aggregate.crps);
println!("NLL:              {}", report.aggregate.nll);
println!("PIT-calibration:  {}", report.aggregate.adaptive_ece);
println!("sharpness (90%):  {}", report.aggregate.sharpness);
println!("coverage (90%):   {}", report.aggregate.coverage);
Ok(())
}
}

Python

report = db.eval_calibration(
    "patents",
    "calib.public.calibration_holdout",
    shape="gaussian",   # or "sample"
)
print("CRPS:", report["aggregate"]["crps"])
print("coverage:", report["aggregate"]["coverage"])

Slice by cohort

Marginal coverage hides conditional miscoverage: a predictor can hit 90% coverage globally while systematically under-covering a subgroup. Tag records with opaque cohort segments — keyed by record_id — and the report slices coverage and CRPS per cohort, each with its sample size n and a bootstrap confidence interval on the proper score, so a small cohort is not over-read.

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
use std::collections::{BTreeMap, HashMap};
use jammi_ai::session::InferenceSession;
use jammi_ai::eval::EvalCalibrationShape;
async fn ex(session: &InferenceSession) -> jammi_db::error::Result<()> {
let mut cohorts: HashMap<String, BTreeMap<String, String>> = HashMap::new();
cohorts.insert(
    "r1".to_string(),
    BTreeMap::from([("region".to_string(), "emea".to_string())]),
);

let report = session
    .eval_calibration(
        "patents",
        "calib.public.calibration_holdout",
        EvalCalibrationShape::Gaussian,
        &cohorts,
    )
    .await?;

for cohort in &report.per_cohort {
    println!(
        "{}={}: n={} coverage={} crps={}",
        cohort.key, cohort.value, cohort.n, cohort.coverage, cohort.crps
    );
}
Ok(())
}
}

Compare two predictors with a p-value

The per-record scores are persisted to _jammi_eval_per_query keyed by the run id, exactly like the embedding eval. Pairing the per-record CRPS of two runs by record_id and running the same distribution-free paired significance test the retrieval comparison uses turns “B is better-calibrated than A” into a CRPS delta with a confidence interval and a p-value — not a vibe.

Determinism

Given the same inputs the report is bit-for-bit reproducible: every scoring function is deterministic and the only randomness — the cohort confidence-interval bootstrap — runs under a pinned seed.

Register a Mutable Companion Table

Measured companion: for the long-form, executed-and-measured Python treatment, see The Cookbook → Feature Store.

A mutable companion table lives in the same backend database as the Jammi catalog (SQLite by default, Postgres in shared deployments), supports transactional INSERT / UPDATE / DELETE through DataFusion DML, and federates with Parquet result tables and external sources in one SQL surface. Reach for it when a tenant needs a relation it can edit row by row — a feature-store slowly-changing dimension table, a per-user state table, a config-driven lookup — that still has to participate in the same JOINs as your immutable result tables.

The primitive carries only what every consumer needs: a schema, a primary key, optional tenant scope, optional secondary indexes, optional ordering column. No history semantics, no lifecycle vocabulary, no audit columns.

Goal

This recipe walks through registering one mutable companion table for a neutral third-party use case (a feature-store team called Polaris Features maintaining slowly-changing dimensions for their recommender) and shows the equivalent Rust / Python / CLI surface.

Setup

Assumes a working JammiSession. The session opens the catalog at the configured artifact directory; nothing else is needed.

Define the schema

Polaris keeps one row per (item_id, valid_from, valid_to) interval:

#![allow(unused)]
fn main() {
extern crate arrow_schema;
fn make() {
use std::sync::Arc;
use arrow_schema::{DataType, Field, Schema};

let schema = Arc::new(Schema::new(vec![
    Field::new("item_id",      DataType::Utf8,    false),
    Field::new("price_tier",   DataType::Utf8,    false),
    Field::new("availability", DataType::Utf8,    false),
    Field::new("valid_from",   DataType::Int64,   false),  // epoch milliseconds
    Field::new("valid_to",     DataType::Int64,   true),   // epoch milliseconds; NULL = open
]));
}
}

The catalog encoder accepts the closed primitive subset enforced by every MutableBackend impl — Boolean, the integer family, Float32 / Float64, Utf8, Binary. Wider types (e.g. Timestamp, Decimal) round-trip via their natural numeric encoding (Int64 epoch milliseconds, scaled Int64) so the schema stays narrow and the rule stays one-line at the boundary.

The engine reserves tenant_id and any column whose name starts with _ — the schema builder rejects them at build time per ADR-00. (The tenant_id column is always present on the storage table; the engine appends it implicitly.)

Build the definition

MutableTableDefinitionBuilder chains the field validations:

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate arrow_schema;
use std::sync::Arc;
use arrow_schema::Schema;
use jammi_db::store::mutable::definition::{
    MutableIndexDef, MutableTableDefinitionBuilder, MutableTableId,
};

fn make(schema: Arc<Schema>) -> jammi_db::store::mutable::definition::MutableTableDefinition {
let def = MutableTableDefinitionBuilder::new(
        MutableTableId::new("item_dimensions").unwrap(),
        schema,
    )
    .primary_key(vec!["item_id".into(), "valid_from".into()])
    .index(MutableIndexDef {
        name: "idx_item_dim_active".into(),
        columns: vec!["item_id".into(), "valid_to".into()],
        unique: false,
    })
    .build()
    .unwrap();
def
}
}

The primary key must be a non-empty subset of the schema; secondary indexes are optional but persisted on the storage table so the backend can use them for WHERE clauses.

Register

The registration is atomic: catalog row + storage CREATE TABLE + every secondary CREATE INDEX commit together. If any step fails, nothing lands.

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate tokio;
use jammi_db::store::mutable::definition::MutableTableDefinition;
use jammi_db::session::JammiSession;
async fn ex(session: &JammiSession, def: MutableTableDefinition) -> jammi_db::error::Result<()> {
let id = session.create_mutable_table(def).await?;
// The table is now queryable as `mutable.public.item_dimensions` in the
// same SQL surface that federates result tables and external sources.
Ok(())
}
}

Python

import pyarrow as pa
import jammi

db = jammi.connect("file:///var/lib/jammi")
# The Python wrapper exposes mutable-table registration through the
# `create_mutable_table` accessor (see `jammi.mutable`). The recipe below
# is illustrative; consult the API reference for the binding shape your
# version ships.

CLI

The jammi CLI exposes mutable-table registration through the lower-level sources surface for now; programmatic clients should use the Rust or Python APIs.

Verify

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate tokio;
async fn ex(session: &jammi_db::session::JammiSession) -> jammi_db::error::Result<()> {
let zero_rows = session
    .sql("SELECT * FROM mutable.public.item_dimensions LIMIT 0")
    .await?;
assert_eq!(zero_rows[0].schema().fields().len(), 5);
Ok(())
}
}

The query returns a zero-row batch with the declared schema — confirmation that the table is registered and DataFusion can route mutable.public.<id> correctly.

Federation tease

The mutable table now JOINs with your existing result tables and sources:

SELECT  d.item_id, d.price_tier, e.embedding
FROM    mutable.public.item_dimensions d
JOIN    itemembs.public.item_embeddings e ON e.item_id = d.item_id
WHERE   d.valid_to IS NULL
  AND   d.price_tier = 'premium'
LIMIT 10;

See the Run Transactional Updates on a Mutable Table recipe for INSERT / UPDATE / DELETE round-trips and the SCD Type 2 close-and-open pattern.

Run Transactional Updates on a Mutable Table

Once a mutable companion table is registered (see Register a Mutable Companion Table), you update its rows with the same SQL surface that runs your read queries. Every INSERT / UPDATE / DELETE lands in one backend transaction — either every row commits or none does — and federates with your immutable result tables in subsequent SELECTs.

Goal

Walk through the three DML verbs against the item_dimensions table from the previous recipe, then demonstrate the Slowly-Changing Dimension Type 2 close-and-open pattern Polaris uses to record a price-tier change.

Insert

INSERT INTO mutable.public.item_dimensions
    (item_id, price_tier, availability, valid_from)
VALUES
    ('sku-1842', 'standard', 'in_stock', '2026-04-01T00:00:00Z'),
    ('sku-2901', 'premium',  'in_stock', '2026-04-01T00:00:00Z'),
    ('sku-3457', 'standard', 'out_of_stock', '2026-04-01T00:00:00Z');

The RecordBatch returned by session.sql(...) carries a single-row UInt64 column called count per DataFusion’s TableProvider::insert_into contract. Three rows landed; the JOIN-against-result-tables query from the previous recipe now returns three rows.

Update

UPDATE mutable.public.item_dimensions
   SET availability = 'low_stock'
 WHERE item_id = 'sku-2901';

Predicate columns that participate in an index pushdown become a backend WHERE clause; the rest filter above the scan node. The update commits in one transaction; if the predicate matches zero rows, the call succeeds with rows_affected = 0.

Delete

DELETE FROM mutable.public.item_dimensions
 WHERE item_id = 'sku-3457';

DELETE follows the same shape. Row-level cascades are SQLite’s job (the foreign-key declarations on the storage table); the engine does not model cascades above the backend.

SCD Type 2 — close-and-open

Polaris records a price-tier change by closing the active row’s valid_to and inserting a new row with the new tier. Both statements must land atomically; today the supported pattern is to issue them as a single multi-statement SQL string through session.sql, which DataFusion plans as one DML batch under one transaction:

-- Single sql() call so both statements land in one transaction.
UPDATE mutable.public.item_dimensions
   SET valid_to = '2026-05-15T12:00:00Z'
 WHERE item_id = 'sku-1842' AND valid_to IS NULL;

INSERT INTO mutable.public.item_dimensions
    (item_id, price_tier, availability, valid_from)
VALUES
    ('sku-1842', 'premium', 'in_stock', '2026-05-15T12:00:00Z');

A future JammiSession::transaction(|tx| async { … }) API will make multi-statement DML atomicity explicit; today the multi-statement SQL string is the supported surface.

Federation join

The mutable table now joins with the embedding table to surface recommender candidates filtered by current tier:

SELECT  d.item_id, d.price_tier, e.embedding
  FROM  mutable.public.item_dimensions d
  JOIN  itemembs.public.item_embeddings e ON e.item_id = d.item_id
 WHERE  d.valid_to IS NULL
   AND  d.price_tier = 'premium'
 LIMIT 10;

The federation is the engine’s existing FederationOptimizerRule work — no special integration needed; mutable tables register under the same SessionContext as your Parquet result tables and external sources.

Crash recovery

If the process dies mid-write, no partial commit is visible on restart. SQLite’s WAL mode (documentation) and Postgres’s MVCC each guarantee that an open transaction either commits as a whole or is rolled back on connection loss. The engine inherits that guarantee through the CatalogBackend::transaction closure shape: when the closure returns Err(_), the backend rolls back; when the process is killed mid-execution, the backend rolls back the in-flight transaction.

Direct-access append + replay (Phase 4 trigger streams)

Two lower-level methods bypass DataFusion’s planner for high-throughput event paths:

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate arrow;
extern crate tokio;
async fn ex(
    session: &jammi_db::session::JammiSession,
    batch: arrow::array::RecordBatch,
) -> jammi_db::error::Result<()> {
use jammi_db::store::mutable::definition::MutableTableId;
use jammi_db::catalog::backend::TxOptions;

let id = MutableTableId::new("events").unwrap();
let registry = session.mutable_tables_arc();
let backend = session.catalog().backend_arc();

// Direct INSERT via insert_batch — caller owns the transaction.
backend
    .transaction(TxOptions::default(), move |tx| {
        let registry = registry.clone();
        let id = id.clone();
        let batch = batch.clone();
        Box::pin(async move {
            registry
                .insert_batch(tx, &id, &batch)
                .await
                .map_err(|e| jammi_db::BackendError::Execution(e.to_string()))?;
            Ok::<(), jammi_db::BackendError>(())
        })
    })
    .await?;
Ok(())
}
}

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate futures;
extern crate tokio;
use futures::StreamExt;
async fn ex(
    session: &jammi_db::session::JammiSession,
) -> jammi_db::error::Result<()> {
use jammi_db::store::mutable::definition::MutableTableId;

let id = MutableTableId::new("events").unwrap();
// Stream rows where the registered `order_column` value > 100.
let mut stream = session
    .mutable_tables()
    .scan_after(&id, 100)
    .await
    .map_err(|e| jammi_db::error::JammiError::Catalog(e.to_string()))?;
while let Some(batch) = stream.next().await {
    let _batch = batch
        .map_err(|e| jammi_db::error::JammiError::Catalog(e.to_string()))?;
    // …
}
Ok(())
}
}

These are the surface Phase 4’s trigger broker uses to publish events into a backing table and replay subscribers; general consumers should prefer the SQL surface.

Publish Events to a Topic

Measured companion: for the long-form, executed-and-measured Python treatment, see The Cookbook → Change Data Capture.

A trigger-stream topic is a catalog-registered Arrow schema plus a backing mutable table. Publishers append RecordBatches; subscribers filter and receive them. The engine owns the offset counter and the durable event log; the broker (in-memory by default, NATS JetStream in clustered deployments) fans live deliveries out to attached subscribers.

Reach for the trigger stream when a tenant needs event semantics — a CDC pipeline, a feature-store update bus, a job-completion notification fan-out — that has to coexist with the SQL surface the rest of the platform already uses. Every published event lands as a row in the topic’s backing mutable table; that table is queryable with the same Flight SQL surface as any other mutable companion table, so ad-hoc analytics on the event log come for free.

Goal

Walk through registering one topic for a neutral third-tenant use case (a small CDC pipeline pulling Postgres change events into a downstream search index) and publish a batch of events from Rust.

Setup

Assumes a JammiSession whose JammiConfig.trigger_broker is left at its default — the embedded InMemoryBroker. Production deployments swap in JetStreamBroker via configuration; the publisher API does not change.

Define the topic schema

#![allow(unused)]
fn main() {
extern crate arrow_schema;
fn make() {
use std::sync::Arc;
use arrow_schema::{DataType, Field, Schema};

let schema = Arc::new(Schema::new(vec![
    Field::new("op",         DataType::Utf8,  false),
    Field::new("ts_ms",      DataType::Int64, false),
    Field::new("key",        DataType::Utf8,  false),
    Field::new("after",      DataType::Utf8,  true),
]));
}
}

The schema is the contract every published batch must satisfy. The engine reserves the _offset, _row_idx, and _produced_at column names (all leading-underscore names are reserved); user schemas must not include them.

Register the topic

Topic registration is a typed lifecycle verb, not a SQL statement: build a TopicDefinition and register it. The Session::register_topic surface (and the gRPC CatalogService.RegisterTopic verb it rides) does this in one call.

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate arrow_schema;
use std::collections::BTreeMap;
use std::sync::Arc;
use arrow_schema::SchemaRef;
use jammi_db::trigger::{TopicDefinition, TopicId};

fn make(schema: SchemaRef) -> TopicDefinition {
let topic = TopicDefinition {
    id: TopicId::new(),
    name: "cdc.orders".into(),
    schema,
    tenant: None,                          // None = global; Some(t) scopes to t
    broker_metadata: BTreeMap::new(),      // driver-specific opts (e.g. retention)
};
topic
}
}

The CLI exposes the same shape via jammi trigger register --name … --schema ….

import jammi
import pyarrow as pa

db = jammi.connect("file:///var/lib/jammi")
db.register_topic(
    "cdc.orders",
    schema=pa.schema([
        ("op", pa.string()),
        ("ts_ms", pa.int64()),
        ("key", pa.string()),
        ("after", pa.string()),
    ]),
    broker_metadata={"retention_seconds": "604800"},
)

The id is a UUIDv7 minted at construction — time-ordered so the catalog index keeps insert locality. The name is opaque to the engine beyond catalog lookup; pick a hierarchical namespace that suits your platform (e.g. cdc.orders, feature_store.user_features).

Registration is atomic: the topics row, the backing mutable table, and any broker-side state commit together; nothing lands on failure.

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate tokio;
use std::sync::Arc;
use jammi_db::trigger::{TopicDefinition, TriggerBroker};
async fn ex(
    topic_repo: &jammi_db::catalog::topic_repo::TopicRepo,
    broker: Arc<dyn TriggerBroker>,
    topic: &TopicDefinition,
) -> Result<(), jammi_db::trigger::TriggerError> {
broker.register_topic(topic).await?;
topic_repo.register_topic(topic).await?;
Ok(())
}
}

Publish a batch

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate arrow;
extern crate arrow_schema;
extern crate tokio;
use std::sync::Arc;
use arrow::array::{Int64Array, RecordBatch, StringArray};
use arrow_schema::SchemaRef;
use jammi_db::trigger::{Publisher, TopicDefinition};
use jammi_db::TenantId;
async fn ex(
    publisher: &Publisher,
    topic: &TopicDefinition,
    schema: SchemaRef,
    tenant: Option<TenantId>,
) -> Result<(), jammi_db::trigger::TriggerError> {
let batch = RecordBatch::try_new(
    schema,
    vec![
        Arc::new(StringArray::from(vec!["c", "u", "d"])),
        Arc::new(Int64Array::from(vec![1700_000_000_000, 1700_000_000_100, 1700_000_000_200])),
        Arc::new(StringArray::from(vec!["order-1", "order-2", "order-3"])),
        Arc::new(StringArray::from(vec![Some("{...}"), Some("{...}"), None])),
    ],
)
.unwrap();
let offset = publisher.publish_scoped(topic, tenant, batch).await?;
println!("published offset = {}", offset.value());
Ok(())
}
}

publish_scoped tags every row’s tenant_id column from the explicit tenant: Option<TenantId> argument — no silent dependency on session state at publish time. Pass None for global topics; pass the session’s current tenant (session.tenant()) for tenant-scoped publishes.

Python equivalent — publish_topic accepts a pyarrow.Table via the Arrow C Stream Interface so the conversion is zero-copy:

import pyarrow as pa

table = pa.table({
    "op":    ["c", "u", "d"],
    "ts_ms": [1700_000_000_000, 1700_000_000_100, 1700_000_000_200],
    "key":   ["order-1", "order-2", "order-3"],
    "after": ["{...}", "{...}", None],
})
offset = db.publish_topic("cdc.orders", batch=table)
print(f"published offset = {offset}")

publish_scoped validates the batch schema against the topic schema before opening a transaction. A mismatch returns BatchSchemaMismatch and nothing lands in the backing table. If the topic is tenant-pinned (TopicDefinition::tenant = Some(t)) and the tenant argument doesn’t match, the publish is rejected up front with PublishTenantMismatch.

What just happened

The Publisher minted the next monotonic offset for the topic (seeded lazily from MAX(_offset) on the backing table the first time the topic is touched).
The augmented batch — user columns plus _offset, _row_idx, and _produced_at — was inserted into the topic’s backing mutable table inside one CatalogBackend::transaction. On commit the offset advances; on rollback it is reused for the next attempt so no gaps appear in the log.
The broker received the batch for best-effort fan-out to any live subscribers. A broker fan-out failure after commit is logged but does not fail the publish — subscribers replay missed offsets from the backing table on next reconnect.

Goal

Open a subscription on the cdc.orders topic with a predicate that matches only deletes, and consume the stream from Rust.

Setup

Assumes the topic was registered (see Publish Events to a Topic) and you have a Subscriber constructed against the same broker the publisher uses.

Open the subscription

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate arrow;
extern crate arrow_schema;
extern crate datafusion;
extern crate futures;
extern crate tokio;
use std::sync::Arc;
use arrow_schema::SchemaRef;
use datafusion::execution::context::SessionContext;
use futures::StreamExt;
use jammi_db::trigger::{Predicate, Subscriber, TopicDefinition};
async fn ex(
    subscriber: &Subscriber,
    session: &SessionContext,
    topic: &TopicDefinition,
) -> Result<(), jammi_db::trigger::TriggerError> {
let predicate = Predicate::from_sql(session, Arc::clone(&topic.schema), "op = 'd'")?;

let mut stream = subscriber
    .subscribe(topic, predicate, None /* from_offset: None = live tail */)
    .await?;

while let Some(delivered) = stream.next().await {
    let batch = delivered?;
    handle_deletes(batch.batch);
}
Ok(())
}
fn handle_deletes(_: arrow::array::RecordBatch) {}
}

from_offset = None starts the subscription at the broker’s live tail (no replay). Some(0) starts from the earliest retained event; the engine joins backing-table replay with the live broker stream so the client sees one continuous sequence of DeliveredBatch.

Predicate dialect

Predicates are a subset of DataFusion SQL. The whitelist:

Supported	Rejected
Column references (`col`)	Subqueries (`SELECT …`)
Literal scalars (`1`, `'foo'`, `true`)	Aggregates (`SUM`, `COUNT`, …)
Comparison ops (`=`, `<`, `>`, `<=`, `>=`, `!=`)	Window functions
Boolean ops (`AND`, `OR`, `NOT`)	Joins
`IS NULL`, `IS NOT NULL`	`CASE WHEN`
`IN (literal, literal, …)`	Functions outside the whitelist
`LIKE`, `BETWEEN`
Whitelisted string functions

The string-function whitelist is lower, upper, length, starts_with, ends_with. Anything outside this list returns PredicateUnsupported at subscribe time — the stream never opens. An unparseable predicate returns PredicateParse for the same reason.

Reconnection and replay

If your consumer disconnects and reconnects, pass the last-seen offset as from_offset to resume without missing events:

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate chrono;
extern crate tokio;
use chrono::Utc;
use std::sync::Arc;
use jammi_db::trigger::{Offset, Predicate, Subscriber, TopicDefinition};
async fn ex(
    subscriber: &Subscriber,
    topic: &TopicDefinition,
    last_seen: u64,
) -> Result<(), jammi_db::trigger::TriggerError> {
let resume_from = Offset::new(last_seen + 1, Utc::now());
let _stream = subscriber
    .subscribe(topic, Predicate::match_all(), Some(resume_from))
    .await?;
Ok(())
}
}

The engine reads the backing table for offsets >= resume_from, then attaches the broker live stream starting strictly above the last replayed offset — the two halves never deliver the same offset twice.

Backpressure

A slow consumer slows the producer; events are not dropped. The chain is: the broker tail backs up onto a bounded mpsc::channel, the channel’s send() future awaits, the broker poll loop pauses, publishers awaiting the broker fan-out experience matching back- pressure. The backing table — the authoritative log — is still written without delay, so a consumer that disconnects under load can always catch up via replay.

Replay Events from the Backing Table

Every topic’s event log is a Phase-2 mutable companion table named __topic_<topic_id>. The double-underscore prefix is reserved for engine-controlled tables; consumers do not register tables under that namespace. Flight SQL queries against the backing table compose with the same federation surface the rest of Jammi exposes — joins with result tables, predicate pushdown, aggregates over event history.

Reach for direct replay when a tenant needs ad-hoc analytics on the event log that would be awkward through the subscribe surface — counting events per key, computing per-day rollups, joining the event stream against a Parquet result table.

Goal

Run an ad-hoc query that returns the count of op = 'd' events per hour over the durable log for cdc.orders.

Backing table naming

Every registered topic has a backing table whose name is __topic_<topic_id> where <topic_id> is the hyphenated lowercase TopicId::Display. To find the name, query topics:

SELECT topic_id, name, backing_table FROM topics WHERE name = 'cdc.orders';

Schema

The backing table’s columns are the topic’s user schema with three engine-controlled columns prepended:

Column	Type	Purpose
`_offset`	`BIGINT NOT NULL`	Monotonic offset; stable across rows of one publish.
`_row_idx`	`BIGINT NOT NULL`	Position within a publish, for the composite PK.
`_produced_at`	`BIGINT NOT NULL` (UTC microseconds)	Publisher-side timestamp, single value per offset.
…user cols…	per `TopicDefinition.schema`	Payload columns.
`tenant_id`	`TEXT` (nullable, added by Phase 2)	Tenant scope per ADR-00.

The primary key is (_offset, _row_idx); _offset is the order column so scan_after and ORDER BY _offset agree.

Query

SELECT
    DATE_TRUNC('hour', TIMESTAMP_MICROS(_produced_at))      AS hour,
    COUNT(*)                                                 AS deletes
FROM    mutable.public.__topic_019088da_1234_7890_abcd_ef1234567890
WHERE   op = 'd'
GROUP BY hour
ORDER BY hour;

Substitute your topic’s backing_table (looked up from the topics catalog row) for the literal name in the example. The query runs through Flight SQL like any other federated query — predicate pushdown applies, joins compose, aggregates run.

Tenant scoping

The backing table carries the tenant_id column added by the Phase-2 mutable backend. Sessions bound to a tenant see only rows whose tenant_id matches or is NULL, per Phase 3’s predicate-injection analyzer rule — the same guard that scopes the rest of the catalog.

Scope a Session to a Tenant

Measured companion: for the long-form, executed-and-measured Python treatment, see The Cookbook → Tenancy.

When more than one logical tenant shares a Jammi engine — a SaaS feature store serving two ML teams, a research workbench shared across three labs, a notebook product hosting one project per student — every catalog read and write needs to belong to the right tenant. Jammi’s session-scoped tenant binding does this without the caller having to spell a WHERE tenant_id = … clause on every query.

Goal

After this recipe you can:

Bind a tenant to a session in Rust, Python, and on the CLI.
Verify that two sessions on the same process see disjoint rows.
Bind a tenant on a remote client via the gRPC CatalogService so subsequent Flight SQL queries from the same connection observe the tenant.

Setup

Every example below assumes a configured JammiConfig (defaults are fine for the recipe). The tenant identifier is a UUID v4 or v7 string — the engine refuses the nil UUID (00000000-…) at the TenantId newtype boundary.

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate tokio;
use std::str::FromStr;
use jammi_db::TenantId;
use jammi_db::session::JammiSession;
use jammi_db::config::JammiConfig;

async fn ex() -> jammi_db::error::Result<()> {
let config = JammiConfig::default();
let alice = TenantId::from_str("018f5a0e-c4c8-7e10-9c4f-3b6f7c5a8e9a")?;

let session = JammiSession::new(config).await?.with_tenant(alice);
// Every catalog read and write on `session` now scopes to Alice.
Ok(())
}
}

with_tenant is a builder that consumes self and returns Self, so it chains naturally. If you hold a session behind Arc, use bind_tenant(&t) to update the binding in place — the session shares one TenantBinding across all references.

Python

import jammi

db = jammi.connect("file:///tmp/jammi")
db.set_tenant("018f5a0e-c4c8-7e10-9c4f-3b6f7c5a8e9a")

# Subsequent calls observe Alice's tenant scope.
db.add_source("inbox", path="/data/alice/inbox.parquet", format="parquet")
db.sql("SELECT * FROM inbox.public.inbox")

set_tenant is a sticky setter — it mutates the connection in place and stays in effect until the next set_tenant. Pass an empty string to clear: db.set_tenant("").

For a binding scoped to a single block — the prior tenant restored on exit, and nesting handled — use tenant_scope as a context manager:

with db.tenant_scope("018f5a0e-c4c8-7e10-9c4f-3b6f7c5a8e9a"):
    # Reads here observe Alice's tenant scope.
    db.sql("SELECT * FROM inbox.public.inbox")
# Prior scope restored here.

The same surface is available on a remote connection (jammi.RemoteDatabase), where the prior tenant is captured client-side and rebound on exit.

CLI

The --tenant flag is global; it applies to every subcommand.

jammi --tenant 018f5a0e-c4c8-7e10-9c4f-3b6f7c5a8e9a sources list
jammi --tenant 018f5a0e-c4c8-7e10-9c4f-3b6f7c5a8e9b models list

Remote clients (gRPC + Flight SQL)

A programmatic client (Python, Go, Java) binds the tenant once per connection via the jammi.v1.catalog.CatalogService.SetTenant RPC. The server records the tenant against the jammi-session-id request metadata header; every Flight SQL query the same connection issues afterwards inherits the binding through the same resolver — the engine-default SessionIdTenantResolver — applied by the single async tenant-binding layer (TenantResolverLayer) that fronts both the CatalogService and the Flight SQL provider. Browser clients reach the same CatalogService over HTTP/1.1 via the gRPC-Web shim (application/grpc-web+proto) — no separate REST surface, same jammi-session-id header semantics.

import grpc
from jammi.v1 import catalog_pb2, catalog_pb2_grpc

channel = grpc.insecure_channel("jammi.example.com:50051")
metadata = [("jammi-session-id", "my-client-uuid")]

client = catalog_pb2_grpc.CatalogServiceStub(channel)
client.SetTenant(
    catalog_pb2.SetTenantRequest(
        tenant=catalog_pb2.Tenant(id="018f5a0e-c4c8-7e10-9c4f-3b6f7c5a8e9a")
    ),
    metadata=metadata,
)
# Subsequent Flight SQL queries on the same channel + jammi-session-id
# observe Alice's tenant scope.

This flow assumes a trusted network. The jammi-session-id header is a client-minted, opaque transport correlation id — it identifies a connection, not a principal. The server does not authenticate it: anyone who presents another session’s id assumes that session’s tenant. SetTenant writes a tenant the caller asserts; nothing verifies the caller is entitled to it. That is the right trade-off when every client is inside your trust boundary (a private VPC, a sidecar mesh, a single-process notebook), and the wrong one the moment an untrusted caller can reach the port. Do not treat jammi-session-id as an authentication or authorization boundary.

Bring your own auth

Jammi authenticates nothing on its own — it is a substrate, and identity is a consumer’s vocabulary. To put a tenant boundary in front of untrusted callers, you supply the authentication and authorization yourself by implementing a TenantResolver and passing it to GrpcChain.tenant_resolver when you assemble the chain via assemble_grpc_chain. One resolver, plugged in once, binds every engine gRPC verb AND the Flight SQL db.sql lane — the same async tenant-binding layer (TenantResolverLayer) applies the resolved scope to both transports, so there is nothing separate to wire up for Flight.

Authenticate the principal. In resolve, read the caller’s credential — a bearer token, a session cookie your gateway exchanges, a service-to-service token — and verify it. A missing or invalid credential returns Err(Status::unauthenticated(..)) here, before any handler runs.
Authorize the tenant from the verified claim. Derive the tenant from the verified claim — never from a header the caller controls. This is where your policy lives: which tenant this principal may act as. Return Ok(TenantScope::Tenant(t)).
The engine binds it. The async TenantResolverLayer maps the resolved scope onto the SessionTenant request extension every verb handler reads, and the Flight SQL provider (TenantBoundProvider) binds the same scope for db.sql — you write only resolve.

Because resolve runs in front of every handler, the tenant the engine acts on is the one the credential proves, not one the caller asserts. The jammi-session-id header plays no part in this path. Reject, don’t default: an authenticating resolver returns Tenant/Err and NEVER TenantScope::Global — returning Global on a failed check runs the request unscoped, which for a tenant_id IS NULL-bearing catalog is a global read, so a rejected caller must fail the request. TenantScope::Global is the explicit unscoped choice the engine-default resolver (SessionIdTenantResolver) returns when no tenant is bound — never a value a rejection falls through to.

use tonic::{Status, metadata::MetadataMap};
use jammi_db::TenantId;
use jammi_server::grpc::session::{TenantResolver, TenantScope};

/// A consumer's authenticating resolver. `verify_credential` is the
/// consumer's own identity logic — it authenticates the caller and returns the
/// tenant the verified claim authorizes, or `None` to reject the request.
struct AuthResolver;

#[tonic::async_trait]
impl TenantResolver for AuthResolver {
    async fn resolve(&self, metadata: &MetadataMap) -> Result<TenantScope, Status> {
        // 1. Authenticate: pull the credential the caller presented.
        let credential = metadata
            .get("authorization")
            .and_then(|v| v.to_str().ok())
            .ok_or_else(|| Status::unauthenticated("missing credential"))?;

        // 2. Authorize: derive the tenant from the *verified* claim. A failed
        //    check rejects the request — it never falls through to an unscoped
        //    read that could surface another tenant's rows.
        let tenant: TenantId = verify_credential(credential)
            .ok_or_else(|| Status::unauthenticated("invalid credential"))?;

        // 3. Bind: return the resolved scope. The engine's tenant-binding layer
        //    applies it to every gRPC verb and to Flight SQL.
        Ok(TenantScope::Tenant(tenant))
    }
}
fn verify_credential(_c: &str) -> Option<TenantId> { None }

Plug it in at assembly time, in place of the engine default:

use std::sync::Arc;
use jammi_server::runtime::{assemble_grpc_chain, GrpcChain};

let chain = GrpcChain {
    // .. addr, flight_ctx, flight_binding, store, trigger, engine, tiers, metrics ..
    tenant_resolver: Arc::new(AuthResolver),
    ..chain_defaults
};
let assembled = assemble_grpc_chain(chain)?;

The seam types are TenantResolver (the trait you implement), TenantScope (Tenant/Global), and SessionTenant (the per-request binding every verb reads, which the engine sets for you). This one resolver replaces the engine-default SessionIdTenantResolver for the whole chain — the gRPC verbs and the Flight db.sql lane both read the scope it resolves, closing the cross-transport gap where a boundary authenticated the gRPC plane but Flight still bound from the unauthenticated jammi-session-id header.

Disjoint views — what to expect

Two sessions on the same process, bound to different tenants, will:

Read each other as invisible: list_sources() returns the calling tenant’s sources plus any globally-scoped (tenant_id IS NULL) sources.
Write into different lanes: a register_source from Alice produces a row tagged tenant_id = alice; Bob’s list_sources does not see it.
Share globally-scoped rows: an unscoped (tenant_id IS NULL) registration — typically a public reference dataset — is visible to every tenant.

The engine enforces the binding at four layers (the SPEC-03 defence-in-depth discipline):

Read-side predicate injection — TenantScopeAnalyzerRule injects tenant_id = $current OR tenant_id IS NULL on every TableScan whose schema declares the column.
Result-table resolution gate — a Jammi-owned result table is wholly owned by one tenant (or GLOBAL), so its Parquet carries no tenant_id column for the analyzer to filter on. The tenant-gating result-table schema provider instead gates resolution on the catalog owner: over every lane that names jammi.{table} (Flight db.sql, gRPC sql, search), a correctly-bound tenant resolves only its own and GLOBAL result tables, and a peer’s private table resolves not-found (and is absent from the schema’s table enumeration) — the same (tenant_id = $current OR tenant_id IS NULL) visibility the catalog read API applies.
Write-side guard — every catalog register_* and the mutable-table sink calls Transaction::assert_tenant_matches before INSERT.
Storage-side filter — catalog repo reads also pass the predicate to the backend SQL layer, so the wrong tenant’s rows never leave the database.

A buggy caller that constructs a row with the wrong tenant_id gets BackendError::TenantMismatch from the guard layer.

When the binding doesn’t apply

External federated sources without a tenant_id column — Jammi’s analyzer rule has no column to inject against, so those sources show every row to every tenant unless the source declaration registers a tenant_column override. Catalog tables and mutable companion tables always carry the column.
Cross-tenant WHERE clauses the caller writes by hand — a query that contains WHERE tenant_id = 'other-tenant' runs against the injected predicate plus the user’s clause; the analyzer rule does not remove user-written predicates.
Single-tenant deployments — bind nothing and every row is global; no predicate is injected beyond tenant_id IS NULL.

Scope a Federated Source by Tenant

Measured companion: for the long-form, executed-and-measured Python treatment, see The Cookbook → Tenancy.

The session-scoped tenant binding (multi-tenant.md) relies on every table the engine reads carrying a tenant_id column. That works for mutable companion tables and Parquet result tables Jammi produced itself — both emit the column by ADR-00. But a federated source — a remote Postgres warehouse, a S3 Parquet lake, a CSV from someone else’s pipeline — usually doesn’t. It may carry a customer_id, an organization, a workspace column, or no tenant discriminator at all.

This recipe shows how to tell Jammi which column on a federated source plays the role of the tenant discriminator, so the predicate-injection analyzer rule scopes scans against that column instead of looking for the engine’s built-in tenant_id name.

Goal

After this recipe you can:

Register a federated source whose tenant discriminator is named differently from tenant_id.
Tell the analyzer rule which column to use.
Verify two tenants get disjoint rows from the same physical source.
Recognise what set_source_tenant_column does not do.

Setup

The recipe assumes you have a Parquet file (or any other federated source) whose schema includes a column that already carries the tenant identifier — for example a customer_id column populated with the UUID of the customer who owns each row. The column’s value must be the same canonical hyphenated lowercase form TenantId::Display emits; the analyzer rule does a string comparison after coercing the column to Utf8.

Register a federated source

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate tokio;
use jammi_db::session::JammiSession;
use jammi_db::source::{FileFormat, SourceConnection, SourceType};

async fn ex(session: &JammiSession) -> jammi_db::error::Result<()> {
session
    .add_source(
        "notes",
        SourceType::File,
        SourceConnection {
            url: Some("file:///data/notes.parquet".into()),
            format: Some(FileFormat::Parquet),
            ..Default::default()
        },
    )
    .await?;
Ok(())
}
}

Python

db.add_source("notes", path="/data/notes.parquet", format="parquet")

Declare its tenant column

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
use jammi_db::session::JammiSession;
fn ex(session: &JammiSession) {
session.set_source_tenant_column("notes", Some("customer_id".into()));
}
}

set_source_tenant_column registers the override on the session’s SourceTenantColumns map. The next time the analyzer rule sees a scan against notes.public.notes, it discovers the override and injects WHERE CAST(customer_id AS Utf8) = $current_tenant OR CAST(customer_id AS Utf8) IS NULL (or IS NULL only when the session is unscoped).

The Python and CLI surfaces do not expose this method today — it lives on Rust JammiSession only. If you embed Jammi as a library this is the right hook; if you reach Jammi over Flight SQL / gRPC, the source registration and tenant-column declaration happen on the server side before the server starts accepting client connections.

Schema column tenant_id always wins. Only call set_source_tenant_column when your federated source carries the discriminator under a different name, or when it carries the discriminator at all — sources without any tenant column remain globally visible.

Verify the predicate

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate tokio;
use std::str::FromStr;
use jammi_db::TenantId;
use jammi_db::session::JammiSession;
use jammi_db::config::JammiConfig;
async fn ex() -> jammi_db::error::Result<()> {
let alice = TenantId::from_str("018f5a0e-c4c8-7e10-9c4f-3b6f7c5a8e9a")?;
let bob = TenantId::from_str("018f5a0e-c4c8-7e10-9c4f-3b6f7c5a8e9b")?;

let session_a = JammiSession::new(JammiConfig::default()).await?.with_tenant(alice);
session_a.set_source_tenant_column("notes", Some("customer_id".into()));

let session_b = JammiSession::new(JammiConfig::default()).await?.with_tenant(bob);
session_b.set_source_tenant_column("notes", Some("customer_id".into()));

let count_a = session_a.sql("SELECT COUNT(*) FROM notes.public.notes").await?;
let count_b = session_b.sql("SELECT COUNT(*) FROM notes.public.notes").await?;

// Each session sees only its own rows — `count_a` and `count_b` are
// disjoint subsets of the on-disk file.
Ok(())
}
}

For a file with 10 rows split 6 (customer_id = alice) + 4 (customer_id = bob), the two sessions get 6 and 4 respectively.

What you cannot do

You cannot point set_source_tenant_column at a column that doesn’t exist on the source. The analyzer rule emits a column reference that DataFusion later fails to resolve at execution time, surfacing as a DataFusionError::SchemaError. The override is a trust contract — the engine does not validate the column’s presence at registration time.
You cannot mix tenant_id and a non-tenant_id column on the same source. When the source’s schema already declares tenant_id, the built-in column wins and the override is ignored.
You cannot remove the discriminator at runtime once tenants are actively querying. Call set_source_tenant_column("notes", None) to drop the override; subsequent queries on notes.public.notes will not be tenant-scoped at all.

If the federated source you are wrapping carries no tenant discriminator, two options are open: (1) re-shape upstream so each tenant lands in its own table, registered as a separate source, or (2) accept that the source is globally visible to every session and gate access at a higher layer (Flight SQL session interceptor, gRPC auth middleware). The engine itself does not authenticate; ADR-00 § Engine does not invent tenants applies.

The Materialization Contract: Verifiable Result-Table Identity

Measured companion: for the long-form, executed-and-measured Python treatment, see The Cookbook → Incremental Recompute.

Every result table Jammi publishes carries a verifiable identity: a sidecar attestation that lets a later reader assert “this artifact is the output of definition D over input-state S” — without trusting a name, a path, or an out-of-band convention. This recipe is the operator’s view of that contract: what the attestation contains, how it is written, how to check a table against it, and how recovery reconciles it after a crash. Everything below describes the system as it ships today.

What a materialization manifest is

A result table is published as an immutable Parquet object (plus, for embedding tables, an ANN-index sidecar bundle). Alongside it the engine writes a separate .materialization.json sidecar — for every result table, not only embedding tables — carrying an in-toto-shaped attestation that binds three things to the artifact’s content digest:

a definition hash of how the table was produced;
the as-of anchors of every input the producer read; and
the producing-run identity and instant.

The on-disk shape is MaterializationManifest:

Field	Meaning
`artifact`	The in-toto subject — SHA-256 over the Parquet object’s bytes. The thing a verifier matches by digest.
`definition_hash`	SHA-256 of how the table was produced — the descriptor plus the environment (see below).
`input_anchors`	The immutable state pointer of each input, in producer order.
`produced_by`	The producing-run id — provenance, never the reproducibility anchor.
`produced_at`	The producing instant, RFC3339 — provenance, never the anchor.
`engine_version`	The engine semantic version that produced the artifact.
`manifest_version`	The manifest format version, so a future format change is a typed error rather than a silent misparse.

The artifact digest deliberately covers the Parquet data, never the ANN index sidecar: the index is a derived accelerator reconstructible from the data, so a verdict attests the data-of-record, not the search structure. The two sidecars never collide — .manifest.json describes the ANN accelerator; .materialization.json attests the Parquet data.

The two halves of identity: descriptor and environment

The definition_hash is not a hash of a logical plan. Result-table producers in this engine are hand-built physical pipelines, so there is no single plan to canonicalise. Instead the hash folds two typed, deterministically-serialisable values:

ProducingDescriptor — how the table was computed: the verb plus its typed parameters. Each producer fills in exactly one variant from its own parameters — Inference, Embedding, NeighborGraph, GraphPropagation, or ContextSet. A stable, sorted-key JSON encoding yields canonical bytes.
MaterializationEnv — the output-affecting environment that is not part of the description itself: the engine semantic version, the compute device (Cpu / Cuda { ordinal } / Metal { ordinal }), and the identity + backend kind of every model the producer invoked.

The device is part of the environment for a concrete reason: a model produces different float outputs on CPU versus an accelerator while carrying the same model identity, so a hash that omitted the device would yield a false “match” when only the device changed. The two halves are length-prefixed and domain-separated before hashing, so a descriptor field can never alias an environment field. Two runs of the same producer, with the same parameters, over the same inputs, in the same environment, hash identically; any output-affecting change to any of the three changes the hash.

The input anchors are recorded but are deliberately not part of the definition hash: the definition is how a table is produced, the anchors are over what. A consumer that wants a combined “code + data” identity composes the two itself.

`finalize_with_manifest` is the sole building→ready transition

There is no manifest-free finalize. ResultStore::finalize_with_manifest is the single building → ready path every producer goes through, so no table reaches ready without an attestation. It performs the steps in a crash-safe order:

read the Parquet bytes and compute the artifact digest;
compute the manifest from the descriptor, environment, and resolved inputs;
write the .materialization.json sidecar;
register the table and flip the catalog row building → ready (recording the definition_hash and the input anchors as summary columns).

Because the bytes and the sidecar are durable before the status flip, a crash in the window leaves a building row — never a queryable ready table missing its manifest. Every producer that materialises an embedding table — graph propagation and context-set pooling — routes through this funnel too: ResultStore::materialize_embedding_table writes the table and then calls finalize_with_manifest with the producer’s Materialization (descriptor, environment, and resolved input anchors).

How to verify a table

verify_materialization is the read-only verb that recomputes a ready table’s artifact digest and checks it — and, optionally, an expected definition hash — against the table’s manifest. It returns a MatchVerdict; it never acts on one. What a reader does with a mismatch (refuse, alarm, fall back) is the reader’s policy, not the engine’s.

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use std::sync::Arc;
use jammi_ai::session::InferenceSession;
use jammi_db::store::manifest::{DefinitionHash, MatchVerdict};
async fn ex(session: &Arc<InferenceSession>, table: &str) -> jammi_db::error::Result<()> {
let record = session
    .catalog()
    .get_result_table(table)
    .await?
    .expect("result table exists");

// No expectation: just assert the bytes still match the attestation.
let verdict = session
    .result_store()
    .verify_materialization(&record, None)
    .await?;

match verdict {
    MatchVerdict::Match => { /* artifact is the attested output */ }
    MatchVerdict::MatchWithUnpinnedInputs { unpinned } => {
        // Verified, but at least one input was anchored only to a read
        // instant, so reproducibility cannot be fully asserted. Honest,
        // not silent — downgrade confidence accordingly.
        let _ = unpinned;
    }
    MatchVerdict::Mismatch { expected, found } => {
        // The served artifact is not the output of the expected definition.
        let _ = (expected, found);
    }
    MatchVerdict::MissingManifest => {
        // No sidecar — a pre-contract table. A truthful unknown, never a
        // fabricated match.
    }
}

// Pin an expected definition hash to assert *which* definition produced it.
let expected = DefinitionHash("…".into());
let _ = session
    .result_store()
    .verify_materialization(&record, Some(&expected))
    .await?;
Ok(()) }
}

Python

verify_materialization takes the table name and an optional expected definition hash, and returns the verdict as a dict tagged by verdict:

verdict = db.verify_materialization("results__text_embedding__…")

if verdict["verdict"] == "match":
    pass  # artifact is the attested output
elif verdict["verdict"] == "match_with_unpinned_inputs":
    unpinned = verdict["unpinned"]      # sources anchored only to an instant
elif verdict["verdict"] == "mismatch":
    expected, found = verdict["expected"], verdict["found"]
elif verdict["verdict"] == "missing_manifest":
    pass  # pre-contract table — a truthful unknown

# Pin the definition you expect produced the table:
db.verify_materialization("results__…", expected_definition="<hex definition hash>")

Reading the verdict

Verdict	Meaning
`Match`	The recomputed artifact digest equals the manifest’s, and (if supplied) the expected definition hash equals the manifest’s. The artifact is the output of the expected definition.
`MatchWithUnpinnedInputs { unpinned }`	The artifact verifies, but at least one input was anchored only to a read instant (an external source with no version surface), so reproducibility cannot be fully asserted. The named sources are honest about not being reproducibly pinned.
`Mismatch { expected, found }`	The digest or the definition hash differs — the served artifact is not the output of the expected definition. Both sides are returned for the caller.
`MissingManifest`	No manifest sidecar exists — a table created before the contract landed. A truthful “unknown”, never a fabricated match.

An anchor is “pinned” when it points at an immutable id: a result table’s content digest, a mutable companion table’s monotonic version, or an external source’s as-of/version value (an Iceberg snapshot id, a Delta version, an LSN, a watermark). It is “unpinned” only when the source exposes no version surface, in which case the anchor is the read instant and the verdict says so.

How recovery reconciles manifest sidecars

ResultStore::recover() runs at startup and restores the crash-consistency invariant of the catalog↔storage boundary across every tenant (it runs under an admin scope). For the materialization contract it enforces two rules:

A building orphan with valid Parquet but no manifest is reaped. The write was torn in the window between the Parquet landing and the manifest being written — before the building → ready flip. The contract forbids promoting a table without an attestation, and the producing descriptor cannot be reconstructed after the fact, so the row is driven to failed and its bytes reaped — never promoted manifest-less. (A building orphan that does have its manifest is promoted, backfilling the summary columns the live path records.)
A ready table whose manifest has since vanished is reaped. A post-contract row — one whose catalog definition_hash is set, so it was promoted under the contract — whose .materialization.json is now absent is a corruption: the attestation a verifier would read is gone. Such a row is driven to failed and its bytes reaped, rather than left queryable with a silently missing manifest.

Pre-contract tables report honestly rather than being penalised. A row created before migration 021 carries definition_hash IS NULL in the catalog and legitimately has no sidecar; recovery leaves it untouched, and verify_materialization returns MissingManifest for it — a truthful unknown. This is the distinction the contract draws: a bug (post-contract, no sidecar) is reaped; a legitimate historical table is preserved.

Why this identity matters

The materialization manifest gives every result table a content-addressed, verifiable identity that is independent of its name or path. That identity is the nucleus a future freshness-and-caching layer builds on: once a table’s output is bound to the definition that produced it and the as-of state of its inputs, an incremental-recompute layer can decide whether a cached artifact is still valid by comparing definition hashes and input anchors — rather than re-running the producer blind. The contract ships that identity and the verify primitive today; it ships no policy. What a reader does with a verdict, and when a downstream layer chooses to recompute, are decisions left to the consumer.

Connect to PostgreSQL / MySQL

Jammi federates external databases alongside local files. Register a database as a source and query it with the same SQL interface — joins across local files and databases work seamlessly.

PostgreSQL

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use jammi_ai::session::InferenceSession;
async fn ex(session: &InferenceSession) -> jammi_db::error::Result<()> {
use jammi_db::source::{SourceConnection, SourceType};

session.add_source("pg_data", SourceType::Postgres, SourceConnection {
    url: Some("postgresql://user:pass@localhost:5432/mydb".into()),
    ..Default::default()
}).await?;

let results = session.sql(
    "SELECT id, title FROM pg_data.public.articles WHERE published = true LIMIT 10"
).await?;
Ok(()) }
}

MySQL

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use jammi_ai::session::InferenceSession;
use jammi_db::source::{SourceConnection, SourceType};
async fn ex(session: &InferenceSession) -> jammi_db::error::Result<()> {
session.add_source("mysql_data", SourceType::Mysql, SourceConnection {
    url: Some("mysql://user:pass@localhost:3306/mydb".into()),
    ..Default::default()
}).await?;
Ok(()) }
}

Cross-source joins

Once registered, external databases are queryable with the same three-part naming convention and can be joined with local files:

SELECT p.title, a.author_name
FROM local_data.public.papers p
JOIN pg_data.public.authors a ON p.author_id = a.id
WHERE a.institution = 'MIT'

Generate embeddings from external sources

External databases work as sources for embedding generation:

# Note: external databases must be registered through the Rust API,
# which exposes the typed SourceType::Postgres / SourceType::Mysql variants.
# The Python `add_source(url=…, format=…)` surface is for file-shaped sources.

db.generate_embeddings(
    source="pg_articles",
    model="sentence-transformers/all-MiniLM-L6-v2",
    columns=["title", "abstract"],
    key="id",
    modality="text",
)

Feature flags

External source support requires feature flags when building from source:

Source	Feature flag
PostgreSQL	`postgres`
MySQL	`mysql`

These are enabled by default in published crates and pre-built binaries.

Supported source types

Type	Description	Status
File (`file://`)	Parquet, CSV, JSON on local disk	Always available
File (`s3://` / `gs://` / `azure://`)	Same formats over cloud object stores	Feature-gated — see Cloud Storage
PostgreSQL	Any PostgreSQL-compatible database	Available
MySQL	MySQL / MariaDB	Available
SQLite	SQLite databases	Not supported (rusqlite version conflict)

Store Sources and Results in Cloud Object Storage

Jammi treats local disk, S3, GCS, Azure Blob, and Cloudflare R2 as interchangeable backends. Any place the engine accepts a local file path it also accepts a storage URL — file://, s3://, gs://, azure://, or r2:// — including registered file-shaped sources and the result-table Parquet that embedding and inference jobs write.

Build with the cloud features you need

The default build ships only file:// and the in-memory test driver. Cloud schemes are opt-in per provider so a deployment that only uses S3 does not pull in the GCS and Azure SDK chains:

Feature	Schemes it enables
`storage-s3`	`s3://` (AWS S3 and S3-compatible: MinIO, LocalStack)
`storage-gcs`	`gs://`
`storage-azure`	`azure://`, `abfss://`
`storage-r2`	`r2://` (Cloudflare R2 — the S3 driver with R2’s endpoint + region derived)
`storage-cloud`	All four (umbrella)

[dependencies]
jammi-db = { version = "0.5", features = ["storage-s3", "storage-gcs"] }

Live integration tests live behind matching live-s3-tests, live-gcs-tests, live-azure-tests features so the hermetic cargo test lane never reaches the network.

Register an S3-backed source

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use jammi_ai::session::InferenceSession;
async fn ex(session: &InferenceSession) -> Result<(), Box<dyn std::error::Error>> {
use jammi_db::source::{FileFormat, SourceConnection, SourceType};
use jammi_db::storage::{CloudConfig, S3Config, StorageUrl};

let url = StorageUrl::parse("s3://benchmarks/snapshots/2026/papers.parquet")?;

let conn = SourceConnection {
    url: Some(url.to_string()),
    format: Some(FileFormat::Parquet),
    cloud: Some(CloudConfig::S3(S3Config {
        region: Some("us-east-1".into()),
        ..Default::default()
    })),
    ..Default::default()
};

session.add_source("papers", SourceType::File, conn).await?;

let rows = session
    .sql("SELECT id, title FROM papers.public.papers LIMIT 10")
    .await?;
Ok(()) }
}

If the cloud field is None and the URL is a cloud scheme, the driver falls back to the SDK’s ambient credential chain — env vars, instance profile, IRSA, ADC, Managed Identity.

Python

import jammi

db = jammi.connect("file:///var/lib/jammi")
db.add_source("papers", url="s3://benchmarks/snapshots/2026/papers.parquet", format="parquet")
db.sql("SELECT id, title FROM papers.public.papers LIMIT 10")

The Python binding accepts the same URL forms as the Rust API; per-source cloud credentials are read from process environment.

CLI

jammi sources add papers \
    --url s3://benchmarks/snapshots/2026/papers.parquet \
    --format parquet

GCS and Azure

The pattern is identical — only the URL prefix and the CloudConfig variant change:

#![allow(unused)]
fn main() {
extern crate jammi_db;
use jammi_db::source::{FileFormat, SourceConnection};
fn make() -> SourceConnection {
use jammi_db::storage::{CloudConfig, GcsConfig};

let conn = SourceConnection {
    url: Some("gs://archives/2026/jan.parquet".into()),
    format: Some(FileFormat::Parquet),
    cloud: Some(CloudConfig::Gcs(GcsConfig {
        service_account_path: Some("/etc/jammi/sa.json".into()),
        ..Default::default()
    })),
    ..Default::default()
};
conn }
}

#![allow(unused)]
fn main() {
extern crate jammi_db;
use jammi_db::source::{FileFormat, SourceConnection};
fn make() -> Result<SourceConnection, Box<dyn std::error::Error>> {
use jammi_db::storage::{AzureConfig, CloudConfig};

let conn = SourceConnection {
    url: Some("azure://snapshots/model_outputs.parquet".into()),
    format: Some(FileFormat::Parquet),
    cloud: Some(CloudConfig::Azure(AzureConfig {
        account_name: Some("mystorage".into()),
        sas_token: Some(std::env::var("AZURE_SAS_TOKEN")?),
        ..Default::default()
    })),
    ..Default::default()
};
Ok(conn) }
}

Cloudflare R2

R2 speaks the S3 API, so it rides the same driver — but r2:// is a first-class scheme so you supply only the R2-shaped inputs and the engine derives the two quirks R2 imposes: the account-scoped endpoint https://<account_id>.r2.cloudflarestorage.com and region = "auto". Mint an S3-style access key pair in the R2 dashboard (or via the API) and give Jammi the account id:

#![allow(unused)]
fn main() {
extern crate jammi_db;
use jammi_db::source::{FileFormat, SourceConnection};
fn make() -> Result<SourceConnection, Box<dyn std::error::Error>> {
use jammi_db::storage::{CloudConfig, R2Config};

let conn = SourceConnection {
    url: Some("r2://archives/snapshots/2026.parquet".into()),
    format: Some(FileFormat::Parquet),
    cloud: Some(CloudConfig::R2(R2Config {
        account_id: Some(std::env::var("R2_ACCOUNT_ID")?),
        access_key_id: Some(std::env::var("R2_ACCESS_KEY_ID")?),
        secret_access_key: Some(std::env::var("R2_SECRET_ACCESS_KEY")?),
        ..Default::default()
    })),
    ..Default::default()
};
Ok(conn) }
}

Set endpoint instead of account_id to point at an R2 custom domain. Result tables and their sidecar ANN indexes persist to r2:// exactly as to any other cloud backend.

Persist result tables to the cloud

ResultStore accepts a [StorageUrl] root, so embedding and inference outputs land in the same bucket as the source data:

#![allow(unused)]
fn main() {
extern crate jammi_db;
use std::sync::Arc;
use jammi_db::catalog::Catalog;
fn ex(catalog: Arc<Catalog>) -> jammi_db::error::Result<()> {
use jammi_db::config::AnnIndexConfig;
use jammi_db::storage::{StorageRegistry, StorageUrl};
use jammi_db::store::ResultStore;
use std::sync::Arc;

let root = StorageUrl::parse("s3://benchmarks/jammi_db")?;
let registry = StorageRegistry::new();
// `AnnIndexConfig` tunes the HNSW sidecar index every embedding table carries;
// the default reproduces the index backend's built-in defaults. The last
// argument is the LOCAL cache directory each remote ANN segment is materialised
// into before USearch opens it (a `file://` root loads its segments in place,
// so it is unused there).
let cache_root = std::path::PathBuf::from("/var/lib/jammi/index_cache");
let result_store = Arc::new(ResultStore::with_root(
    root,
    registry,
    catalog,
    AnnIndexConfig::default(),
    cache_root,
)?);
Ok(()) }
}

Every result table the session creates writes its Parquet and sidecar ANN index to that prefix; delete_table_files and the crash-recovery pass operate against the same backend.

Config-driven result storage

A deployment usually does not hand-build the ResultStore — it sets a [storage] section in the config file and lets the session do it. result_root is the storage URL result tables are rooted at; cloud carries the driver credentials, and is the default cloud config the session threads to every driver it builds — both for the result root and for cloud data sources whose add_source call carries no inline credentials.

[storage]
result_root = "r2://jammi-results/prod"

[storage.cloud]
kind = "r2"
account_id = "abc123def456"
# access_key_id / secret_access_key are read from the environment — see below.

Both fields are optional. With result_root unset, result tables stay on local disk under {artifact_dir}/jammi_db/. The catalog backend is independent of this setting (configure it under [catalog]); [storage] governs only result-table and source object storage.

The kind selects the driver and the remaining keys mirror the matching CloudConfig variant:

# AWS S3 (region in TOML, secrets from env)
[storage.cloud]
kind = "s3"
region = "us-east-1"

# Google Cloud Storage
[storage.cloud]
kind = "gcs"
service_account_path = "/etc/jammi/sa.json"

# Azure Blob
[storage.cloud]
kind = "azure"
account_name = "mystorage"

Credentials come from the environment

Secrets are deploy secrets, not config-file values. The S3 and R2 drivers build on object_store’s AmazonS3Builder::from_env(), which reads:

Env var	Used for
`AWS_ACCESS_KEY_ID`	S3 / R2 access key id
`AWS_SECRET_ACCESS_KEY`	S3 / R2 secret access key
`AWS_SESSION_TOKEN`	optional STS session token (S3)
`AWS_ENDPOINT`	optional S3 endpoint override
`AWS_REGION`	optional S3 region

GCS reads GOOGLE_APPLICATION_CREDENTIALS (or Workload Identity); Azure reads the standard AZURE_* chain. So the R2 example above needs only account_id in the TOML — AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY in the container’s environment supply the rest. Any field you do set in [storage.cloud] overrides the value the env chain produced. A half-set credential pair (an access_key_id with no secret_access_key, or vice-versa) is rejected at config-load time rather than on the first request.

How the layout maps onto buckets

For a result table named papers__text_embedding__bge-m3__20260520T120000Z_abc12345, the engine writes the Parquet plus one ANN segment bundle per index segment. A table built in a single embedding pass has one segment, seg0:

s3://benchmarks/jammi_db/papers__text_embedding__bge-m3__….parquet
s3://benchmarks/jammi_db/papers__text_embedding__bge-m3__…__seg0.usearch
s3://benchmarks/jammi_db/papers__text_embedding__bge-m3__…__seg0.rowmap
s3://benchmarks/jammi_db/papers__text_embedding__bge-m3__…__seg0.manifest.json

A table’s ANN index is a set of segments: appending a batch of new rows writes a new bundle (…__seg1.*, …__seg2.*, …) beside the existing ones and leaves them untouched, and a search merges every segment. A quantized-precision segment adds a …__seg{N}.rawf32 rescore companion; a Binary one adds a …__seg{N}.threshold companion too.

The segment layout is the same on every backend; the only difference is the driver under the hood. USearch’s path-based FFI is bridged through a tempfile for cloud schemes so its save / load calls work unchanged, and each remote segment is materialised once into a content-addressed local cache before it is opened.

Deploy as a Server

Jammi can run as an Arrow Flight SQL server, making all registered sources and embedding tables queryable from any Arrow-compatible client. Use this when multiple services, BI tools, or non-Rust/Python consumers need to query Jammi’s data.

The workflow

The server is a read path. Deploy the server, then set up your data through it — with the jammi CLI (a strict gRPC client) or the library — so other systems can query it:

# 1. Start the server
jammi-server

# 2. Register sources against the running server with the CLI
jammi --target grpc://127.0.0.1:8081 \
  sources add patents --url /data/patents.parquet --format parquet

# 3. Generate embeddings (library or Python — not available over Flight SQL)
python3 -c '
import jammi
db = jammi.connect("file:///var/lib/jammi")
db.generate_embeddings(source="patents", model="sentence-transformers/all-MiniLM-L6-v2", columns=["abstract"], key="id", modality="text")
'

Connecting with Arrow Flight SQL

Python (pyarrow)

from pyarrow.flight import FlightClient, FlightDescriptor

client = FlightClient("grpc://localhost:8081")

# Run a SQL query
info = client.get_flight_info(
    FlightDescriptor.for_command(b"SELECT id, title, year FROM patents.public.patents WHERE year > 2020")
)
reader = client.do_get(info.endpoints[0].ticket)
table = reader.read_all()
print(table.to_pandas())

Query embedding tables

Embedding tables are registered in DataFusion and queryable via SQL:

# List all embedding tables
info = client.get_flight_info(
    FlightDescriptor.for_command(b"SELECT table_name FROM information_schema.tables WHERE table_schema = 'jammi'")
)

# Query vectors directly
info = client.get_flight_info(
    FlightDescriptor.for_command(b"SELECT _row_id, _model_id FROM \"jammi.patents__embedding__all-MiniLM-L6-v2__20260325\" LIMIT 10")
)

JDBC

Flight SQL is compatible with JDBC drivers that support the Arrow Flight SQL protocol, enabling access from Java applications, BI tools (Superset, DBeaver, Tableau), and SQL editors.

Server configuration

[server]
flight_listen = "0.0.0.0:8081"
preload_models = ["sentence-transformers/all-MiniLM-L6-v2"]

[logging]
level = "info"
format = "json"    # structured logging for production

Preloading models

Models listed in preload_models are downloaded and loaded into memory at startup. This ensures the session is warm before the server accepts connections.

[server]
preload_models = [
    "sentence-transformers/all-MiniLM-L6-v2",
    "BAAI/bge-small-en-v1.5",
]

Service tiers

One server binary scales to many deployment shapes by mounting only the gRPC service tiers a deployment needs — no per-shape rebuild. The core tier is always mounted: CatalogService (the control plane — tenant binding, the GetServerInfo handshake, and source / model / channel / mutable-table / topic administration), EmbeddingService, InferenceService, AuditService, and the Flight SQL surface. Three optional tiers are runtime-selectable via [server] services:

Tier	Service	Role
`train`	`TrainingService`	model training (fine-tune, graph fine-tune, context predictor)
`event`	`TriggerService`	topic / publish / subscribe streams
`eval`	`EvalService`	per-query evaluation arrays

[server]
services = "all"             # all-in-one: every tier compiled in (the default)
# services = ["event"]       # serve + event box
# services = ["train"]       # serve + training box
# services = []              # serve-only: core tier only

A deployment advertises exactly the tiers it mounted over the wire, so a client can negotiate capability before calling a verb:

info = db.get_server_info()
# {"version": "...", "features": [...], "storage_backends": [...],
#  "services": ["core", "eval", "event", "train"]}
if "train" in info["services"]:
    db.fine_tune(...)

Reaching a verb whose tier was not mounted returns a truthful Unimplemented (“not enabled on this deployment”) rather than a misleading success — the service-mount analog of the client’s build-by-capability connect(target).

Runtime config vs. compile features. The train tier additionally requires the train compile feature (on by default). A --no-default-features serve-only build carries no training surface at all; requesting train in config on such a build is a startup error, not a silent drop. The event and eval tiers always compile and are gated at runtime only. Override the selection with JAMMI_SERVER__SERVICES (all, or a comma-separated token list — empty for serve-only).

GPU configuration

For GPU-accelerated inference in production:

[gpu]
device = 0            # CUDA device index
memory_limit = "auto"
memory_fraction = 0.9
require_gpu = false   # fail fast if the GPU is unavailable instead of CPU fallback

[inference]
batch_size = 64
max_loaded_models = 3

Set gpu.device = -1 for CPU-only deployment. On a GPU build, an unavailable device degrades to CPU with a warning by default; set gpu.require_gpu = true to fail fast instead.

Environment variable overrides

Every config field can be overridden with environment variables, useful for containerized deployments:

JAMMI_SERVER__FLIGHT_LISTEN=0.0.0.0:9081 \
JAMMI_GPU__DEVICE=-1 \
JAMMI_LOGGING__FORMAT=json \
jammi-server

Health, readiness, and metrics

The server exposes three HTTP side-channel endpoints on port 8080:

curl http://localhost:8080/healthz
# {"status":"ok","version":"0.8.0"}

curl http://localhost:8080/readyz
# {"status":"ready"}

curl http://localhost:8080/metrics
# jammi_grpc_requests_total 0
# jammi_flight_queries_total 0
# jammi_eval_invocations_total 0
# jammi_search_latency_seconds_bucket{...} 0

/healthz is a liveness probe — a 200 means the process is running. /readyz is a readiness probe — 200 means the catalog backend responded; 503 means it didn’t and traffic should be drained from this instance. Point your load balancer at /readyz.

/metrics exposes a small, substrate-level set of Prometheus counters (gRPC requests, Flight SQL queries, eval invocations) plus a search- latency histogram.

What the server can and cannot do

Operation	Available over Flight SQL?	Available over typed gRPC?
SQL queries on source tables	Yes	— (use Flight SQL)
SQL queries on embedding tables	Yes	— (use Flight SQL)
Joins, aggregations, filters	Yes	— (use Flight SQL)
Generate embeddings	No — use library or Python package	Yes — `EmbeddingService.GenerateEmbeddings`
Semantic vector search	No — use library or Python package	Yes — `EmbeddingService.Search`
Inference	No — use library or Python package	Yes — `InferenceService.Infer`
Fine-tuning (and graph / context-predictor training)	No — use library or Python package	Yes — `TrainingService.StartTraining` (train tier)
Context-predictor prediction	No — use library or Python package	Yes — `InferenceService.Predict`
Evaluation	No — use library or Python package	Yes — `EvalService` (eval tier)

The Flight SQL surface is a query interface (read path); the ML operations are not SQL, so they ride the typed gRPC surface instead. Set up your data and run training/inference through the Rust library, the jammi-ai / jammi-client Python package, or — for a remote engine — those same verbs over gRPC, then query the results over Flight SQL. The CLI is a strict gRPC client that registers sources and drives the admin surfaces against a running server; it carries no ML verbs and does not run the engine in-process.

The typed gRPC surface is what an edge runtime speaks (it has no HTTP/2 client for Flight SQL’s bidirectional streaming). EmbeddingService serves AddSource, GenerateEmbeddings, EncodeQuery, and Search over plain gRPC — and, since tonic-web is mounted, over gRPC-web — so an edge function running the engine as a sidecar can ingest, encode, and search without the library. Search accepts a precomputed vector or an existing row_key (query-by-example, with the vector resolved inside the engine); see Semantic Search. With the train tier mounted, TrainingService serves all three training kinds over gRPC and InferenceService.Predict serves a trained context predictor — so a client can offload training and prediction to a GPU server with the same verb surface the embedded engine exposes.

Graceful shutdown

The server drains active connections on SIGTERM / Ctrl+C before exiting. In-flight queries complete; long-running operations started via the library are unaffected.

Security posture: trusted-network

The server performs no authentication. There is no token, bearer, or credential check on the gRPC or Flight SQL surfaces, and SetTenant binds whatever tenant a caller asks for — so any client that can reach the endpoint can claim any tenant and invoke any mounted verb. Tenant scope is an isolation boundary within a trusted caller’s traffic (it keeps one tenant’s rows out of another’s results), not an access-control boundary against an untrusted one. This holds for both surfaces equally: the jammi admin CLI (control plane) and the SDK data plane.

Treat the server as trusted-network: run it where only trusted clients can reach it, and supply access control with your own infrastructure —

a private network / VPC with the gRPC + health ports (8081 / 8080) closed to the public internet;
network policy, security groups, or a firewall restricting who may connect;
or an authenticating reverse proxy / gateway in front (e.g. mTLS, or a proxy that validates identity and injects the tenant), terminating untrusted traffic before it reaches the engine.

Do not expose the endpoint directly to an untrusted network. The engine does not invent or verify identities — that is a deployment concern layered above it (ADR-00, Engine does not invent tenants).

Deploying as a container

The OSS server ships as two public Docker images on GHCR:

ghcr.io/f-inverse/jammi-ai-server — CPU, built from a distroless base.
ghcr.io/f-inverse/jammi-ai-server-cu12 — CUDA, for GPU-accelerated inference (see GPU serving).

Both run as the nonroot user (uid 65532), expose the same 8080 / 8081 ports the local binary listens on, and share the same tag scheme (:latest, :vX.Y.Z, :vX.Y). The image entrypoint is jammi-server, so docker run <image> brings up the server with zero config — a local SQLite catalog, the in-memory broker, and every service tier, no TOML required. The jammi admin CLI also ships in the image for running verbs against the server. The examples below use the CPU image.

# Turnkey: zero config, no TOML.
docker run --rm \
  -p 8080:8080 -p 8081:8081 \
  -v jammi_data:/var/lib/jammi \
  ghcr.io/f-inverse/jammi-ai-server:latest

To supply your own config, pass --config to the jammi-server entrypoint:

docker run --rm \
  -p 8080:8080 -p 8081:8081 \
  -v jammi_data:/var/lib/jammi \
  -v $(pwd)/jammi.toml:/etc/jammi/jammi.toml:ro \
  ghcr.io/f-inverse/jammi-ai-server:latest --config /etc/jammi/jammi.toml

A minimal compose file lives in the workspace at examples/docker-compose/oss-server.yml:

cd examples/docker-compose
docker compose -f oss-server.yml up

Persistence

/var/lib/jammi holds the catalog DB, model weights, and indices. Zero-config jammi-server writes its SQLite catalog there (the image sets JAMMI_ARTIFACT_DIR=/var/lib/jammi). On the jammi-ai-server-cu12 image the same volume also holds the CUDA PTX-JIT compute cache at /var/lib/jammi/.nv-cache (see GPU serving) — mounting the volume is what makes that cache durable across container restarts. The Dockerfile declares /var/lib/jammi as a VOLUME owned by uid 65532 — a named Docker volume or no mount at all just works; a bind mount must have the host directory writable by uid 65532:

# Bind mount on the host.
sudo chown -R 65532:65532 /opt/jammi/data
docker run -v /opt/jammi/data:/var/lib/jammi ...

A named Docker volume (the compose default) sidesteps that step because Docker provisions ownership for the container’s user automatically.

Configuration

The image needs no config — it boots zero-config. To override defaults, bind-mount a TOML and point the jammi-server entrypoint at it with a command:. The [gpu], [server], and services knobs documented above (and JAMMI_* env overrides) all apply:

# oss-server.yml
services:
  jammi-server:
    image: ghcr.io/f-inverse/jammi-ai-server:latest
    command: ["--config", "/etc/jammi/jammi.toml"]
    volumes:
      - ./jammi.toml:/etc/jammi/jammi.toml:ro
      - jammi_data:/var/lib/jammi
    ports:
      - "8080:8080"
      - "8081:8081"

Or skip the TOML entirely and tune via environment variables:

services:
  jammi-server:
    image: ghcr.io/f-inverse/jammi-ai-server:latest
    environment:
      JAMMI_SERVER__SERVICES: "event"      # serve + event tier only
      JAMMI_LOGGING__FORMAT: "json"
    volumes:
      - jammi_data:/var/lib/jammi
    ports:
      - "8080:8080"
      - "8081:8081"

GPU serving

The jammi-ai-server-cu12 image builds with candle’s CUDA backend on an NVIDIA CUDA 12.6 runtime base, so libcudart and the rest of the CUDA runtime libraries are present in the image. It carries the same turnkey jammi CLI as the CPU image. Run it on a host with the NVIDIA Container Toolkit and pass --gpus all:

# Turnkey: zero config, GPU inference.
docker run --rm --gpus all \
  -p 8080:8080 -p 8081:8081 \
  -v jammi_data:/var/lib/jammi \
  ghcr.io/f-inverse/jammi-ai-server-cu12:latest

With no TOML the server selects GPU device 0 by default. To override the device or any other knob, pass a config to serve:

docker run --rm --gpus all \
  -p 8080:8080 -p 8081:8081 \
  -v jammi_data:/var/lib/jammi \
  -v $(pwd)/jammi.toml:/etc/jammi/jammi.toml:ro \
  ghcr.io/f-inverse/jammi-ai-server-cu12:latest serve --config /etc/jammi/jammi.toml

Set gpu.device = 0 in jammi.toml (or JAMMI_GPU__DEVICE=0) to select the CUDA device; see GPU configuration. The image is compiled for compute capability 8.0 (Ampere) and runs on 8.0 and every newer datacenter GPU — A10/A6000 (8.6), L40S (8.9), H100 (9.0) — via PTX forward-compatibility. Turing GPUs (e.g. Tesla T4, 7.5) are not supported.

Minimum NVIDIA driver: the image is built against the CUDA 12.6 toolkit and ships single-architecture PTX, so on any GPU newer than 8.0 the driver JIT-compiles that PTX at first model load. This requires a driver new enough for the CUDA 12.6 runtime — Linux: r560 or later (≥ 560.28.03). An older driver (for example 550.x, which tops out at the CUDA 12.4 PTX ISA) can reject the image’s newer PTX at load with CUDA_ERROR_UNSUPPORTED_PTX_VERSION / CUDA_ERROR_INVALID_PTX, even on a supported architecture. nvidia-smi reports the installed driver and its max CUDA version.

JIT cache persistence: the image sets CUDA_CACHE_PATH=/var/lib/jammi/.nv-cache, so the driver’s compiled PTX→SASS cache lands on the /var/lib/jammi volume rather than the container’s ephemeral filesystem. With the volume mounted, the JIT cost above is paid once — a subsequent cold start on the same host reuses the cached SASS instead of re-JIT-ing every model load. Without the volume mounted, each container restart starts with an empty cache and re-pays the JIT. CUDA_CACHE_MAXSIZE (bytes) caps the cache size if the default cap is too small for the set of models you serve.

The CPU image ignores GPU config and runs inference on the CPU.

Building from source

The Dockerfile lives at the workspace root and uses BuildKit cache mounts for the cargo registry and target directory:

# CPU image (default).
DOCKER_BUILDKIT=1 docker build -t jammi-ai-server:dev -f Dockerfile .

# CUDA image — selected by the RUNTIME_VARIANT build-arg.
DOCKER_BUILDKIT=1 docker build -t jammi-ai-server-cu12:dev \
  --build-arg RUNTIME_VARIANT=runtime-cuda -f Dockerfile .

Cold builds take ~30 minutes (the workspace is large); warm builds with cache hits land at ~3 minutes. The CUDA build additionally compiles candle’s CUDA kernels, so its cold build is longer.

Monitor Inference

Attach an observer to inspect every output batch during inference. Use this for logging, metrics collection, quality checks, or progress tracking.

Attach an observer

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate arrow;
extern crate tokio;
use jammi_ai::session::InferenceSession;
use jammi_db::config::JammiConfig;
async fn ex(config: JammiConfig) -> jammi_db::error::Result<()> {
use jammi_ai::inference::observer::InferenceObserver;
use std::sync::Arc;

struct MetricsCollector;

impl InferenceObserver for MetricsCollector {
    fn on_batch(
        &self,
        batch: &arrow::record_batch::RecordBatch,
        model_id: &str,
        latency: std::time::Duration,
    ) {
        println!(
            "Batch: {} rows from {model_id} in {latency:?}",
            batch.num_rows()
        );
    }
}

let session = InferenceSession::with_observer(
    config,
    Some(Arc::new(MetricsCollector) as Arc<dyn InferenceObserver>),
).await?;
Ok(()) }
}

The observer is called once per output batch. When no observer is attached, the overhead is a single Option branch — effectively zero.

Use cases

Progress logging

#![allow(unused)]
fn main() {
extern crate jammi_ai;
extern crate arrow;
use std::sync::atomic::{AtomicUsize, Ordering};
use std::time::Duration;
use arrow::record_batch::RecordBatch;
use jammi_ai::inference::observer::InferenceObserver;
struct ProgressLogger { total: AtomicUsize }

impl InferenceObserver for ProgressLogger {
    fn on_batch(&self, batch: &RecordBatch, _model_id: &str, _latency: Duration) {
        let count = self.total.fetch_add(batch.num_rows(), Ordering::Relaxed) + batch.num_rows();
        eprintln!("Processed {count} rows...");
    }
}
}

Quality checks

#![allow(unused)]
fn main() {
extern crate jammi_ai;
extern crate arrow;
use std::time::Duration;
use arrow::array::StringArray;
use arrow::record_batch::RecordBatch;
use jammi_ai::inference::observer::InferenceObserver;
struct QualityChecker;

impl InferenceObserver for QualityChecker {
    fn on_batch(&self, batch: &RecordBatch, model_id: &str, latency: Duration) {
        // Check for high error rates
        let status = batch.column_by_name("_status").unwrap();
        let errors = status.as_any().downcast_ref::<StringArray>().unwrap()
            .iter().filter(|s| s == &Some("error")).count();

        if errors > batch.num_rows() / 2 {
            eprintln!("WARNING: {model_id} batch has {errors}/{} errors", batch.num_rows());
        }
    }
}
}

Latency tracking

#![allow(unused)]
fn main() {
extern crate jammi_ai;
extern crate arrow;
use std::time::Duration;
use arrow::record_batch::RecordBatch;
use jammi_ai::inference::observer::InferenceObserver;
struct LatencyTracker { slow_threshold: Duration }

impl InferenceObserver for LatencyTracker {
    fn on_batch(&self, batch: &RecordBatch, model_id: &str, latency: Duration) {
        if latency > self.slow_threshold {
            eprintln!(
                "SLOW: {model_id} took {latency:?} for {} rows ({:?}/row)",
                batch.num_rows(),
                latency / batch.num_rows() as u32,
            );
        }
    }
}
}

Pipeline architecture

Source (Parquet/CSV/DB)
    |
    v  DataFusion scan
    |
InferenceExec operator
    |-- Loads model (or cache hit)
    |-- Bounded channel (capacity=2, backpressure)
    |-- InferenceRunner (async task)
    |     |-- Reads input batches
    |     |-- Extracts text from content columns
    |     |-- Tokenizes with model's tokenizer
    |     |-- BERT forward pass
    |     |-- Mean pooling + L2 normalization
    |     |-- Constructs prefix + vector columns
    |     |-- ** Observer called here **
    |     '-- Sends to output channel
    |
    v  RecordBatch stream
    |
Results

Model caching

Models are loaded once and cached with LRU eviction:

First load: downloads from HF Hub (or reads from local path), loads weights into memory
Subsequent calls: cache hit, returns immediately
Ref counting: model stays in memory while any inference is running
Eviction: when the LRU limit is reached, the least-recently-used model with no active references is evicted

Operability

How to run a Jammi server in production: what it exposes for observability, how it shuts down cleanly, the resource limits it enforces, and how it behaves when a dependency fails. Everything below describes the system as it ships today.

Observability surface

The server exposes three HTTP side-channel endpoints, independent of the gRPC and Flight SQL data paths:

Endpoint	Meaning	Status
`/healthz`	Liveness — dependency-free `200` with the build version. The process is up and serving.	`200`
`/readyz`	Readiness — pings the catalog backend the session is bound to. Use it for load-balancer admission.	`200` ready / `503` not ready
`/metrics`	Prometheus text-format snapshot of the substrate metric registry.	`200`

/healthz answers without touching any dependency, so an orchestrator uses it to decide whether to restart the container. /readyz goes one step further and pings the catalog — a transient catalog outage returns 503 so the load balancer removes the instance from rotation rather than restarting it.

$ curl -s localhost:8080/healthz
{"status":"ok","version":"0.29.0"}

$ curl -s localhost:8080/readyz          # catalog reachable
{"status":"ready"}

$ curl -s localhost:8080/readyz          # catalog unreachable → 503
{"status":"not_ready","detail":"catalog ping failed: connection refused"}

Metrics

/metrics emits four substrate-level metrics that the gRPC services and Flight SQL layer feed:

Metric	Type	Incremented by
`jammi_grpc_requests_total`	counter	Any `/jammi.v1.*` gRPC request.
`jammi_flight_queries_total`	counter	A Flight SQL `DoGet` query.
`jammi_eval_invocations_total`	counter	An `EvalService/*` RPC.
`jammi_search_latency_seconds`	histogram	End-to-end `EmbeddingService/Search` request latency.

# HELP jammi_grpc_requests_total Total number of gRPC requests served across all jammi.v1 services.
# TYPE jammi_grpc_requests_total counter
jammi_grpc_requests_total 1432
# HELP jammi_search_latency_seconds Vector-search request latency, in seconds.
# TYPE jammi_search_latency_seconds histogram
jammi_search_latency_seconds_bucket{le="0.05"} 311
jammi_search_latency_seconds_bucket{le="0.1"} 402
jammi_search_latency_seconds_sum 27.41
jammi_search_latency_seconds_count 418

Tracing

The server installs a global tracing subscriber. Spans carry the correlation fields that let you follow a request across the gRPC surface and the worker fleet:

gRPC handler spans carry tenant_id — recorded once the handler has resolved the request’s tenant scope.
run_claimed_job (the worker dispatching a claimed job) carries worker_id, job_id, and tenant_id.
run_spec (the training run inside a claimed job) carries job_id and worker_id.

Logs are emitted as structured records, JSON or human-readable text per logging.format (LogFormat). The filter comes from logging.level, with RUST_LOG as an optional override. Output always goes to stdout — a server runs non-interactively by design — and ANSI colour is enabled only when stdout is a terminal.

{"timestamp":"2026-06-15T04:17:33.114Z","level":"INFO","fields":{"message":"job completed"},"target":"jammi_ai::fine_tune::worker","span":{"job_id":"job-7af3","worker_id":"worker-2","tenant_id":"acme","name":"run_claimed_job"}}

Graceful shutdown

run_with_shutdown drives both the HTTP side-channel and the gRPC surface and drains them in parallel: the call returns once both have stopped accepting new connections and finished serving in-flight requests. The standalone binary wires both SIGINT (Ctrl+C) and SIGTERM, so docker stop — which sends SIGTERM — triggers the same clean drain as an interactive Ctrl+C.

Backpressure and resource limits

The engine enforces these limits. They are the only ones it enforces — there is no configured gRPC message-size cap and no in-memory worker-queue-depth bound; the work queue is durable, not buffered (see below).

Worker timing

The training worker drives its loop on three intervals, defaulting to 30 s lease / 10 s heartbeat / 1 s idle-poll:

Lease (30 s default) — how long a claimed job is exclusively owned before it becomes reclaimable.
Heartbeat (10 s default) — renews the lease well inside the window.
Idle-poll (1 s default) — how often an idle worker checks for new work; reclaim runs on each idle tick, so a dead worker’s job is recovered within roughly one poll plus one lease.

The config layer enforces the invariant heartbeat × 2 < lease (and rejects a zero heartbeat or zero idle-poll). This guarantees a live worker renews at least twice per lease, so a single missed beat still leaves one in-window renewal that lands strictly before expiry — never coincident with it, which would race an idle-polling worker’s reclaim. Bad values are rejected at config time, never silently clamped.

Job attempts cap

A job is retried at most 3 times. After the third attempt the expired-lease reclaim path fails the job for good rather than re-queueing it indefinitely.

GPU admission — a memory budget

GPU admission is a memory budget, not a max-concurrent-job count. The scheduler admits work against a budget of total_gpu_memory × (1 − headroom_fraction): a reservation is admitted by a compare-and-swap against the reserved total, and released via RAII when the permit drops. Many small jobs can run concurrently while one large job is admitted only when its memory fits the remaining budget.

Work queue

The work queue is the durable Postgres training_jobs table, drained with a SELECT … FOR UPDATE SKIP LOCKED claim so concurrent workers each lock a distinct row. It is bounded by the lease plus the attempts cap, not by an in-memory buffer — there is no in-process queue-depth limit to overflow, and a worker crash leaves the row claimable again after the lease expires.

Failure-mode matrix

Failure	Observed behavior	Recovery mechanism	Signal (metric/log)	Proving test
Storage dies mid-publish	No half-written committed artifact. The crashed worker’s per-attempt prefix is orphaned because its finalize CAS never ran.	Winner-only commit: each attempt writes a unique `{job}/{worker}/{attempt}` prefix; the served `artifact_path` is written solely by the finalize CAS, so the committed pointer roots under the winner’s prefix.	Final `training_jobs` row’s `artifact_path` resolves to the winner’s prefix; reload returns the winner’s bytes.	Proven by `tests/distributed/artifact_crash_window.rs`.
Worker dies	The claimed job is reclaimed by a different worker after the lease expires and completes exactly once.	Lease expiry + idle-tick reclaim; the `FOR UPDATE SKIP LOCKED` claim guarantees a single new owner.	The finalized row’s `claimed_by` is a different worker id; reclaim runs each idle tick (worker log).	Proven by `tests/distributed/kill9_reclaim.rs` (plus `exactly_one_claim.rs` for the N-worker claim race and `cross_tenant_isolation.rs` for tenant scope).
GPU dies	—	Memory-budget admission releases the permit via RAII on the failing path, but in-flight GPU-fault recovery is not yet validated end-to-end.	—	Honest gap: not yet proven (1.0-deferred). The distributed lane is CPU-only, so no chaos test exercises a GPU fault.
Broker dies	The trigger stream is a separate subsystem from the training worker fleet — claim and lease are pure Postgres, with no broker coupling — so a broker outage does not stall training. A broker fan-out failure is best-effort: the publisher has already committed the augmented event (with its engine `_offset`) to the durable backing table before fanning out, so the event is never lost — subscribers replay it from the backing table on reconnect.	At-least-once + replay-completeness: the backing table is the authoritative log; a subscriber attaches at an engine `_offset`, replays `[from..last_replayed]` from the table, then joins the live broker tail with overlap and dedups by engine `_offset` so no committed offset is ever skipped across the replay/live seam. The seam is keyed on the engine `_offset` alone — never on a broker-native sequence (JetStream’s stream sequence is an independent counter that skews permanently after any post-commit fan-out failure).	Replayed offsets are contiguous from `from_offset`; the live tail resumes with no gap (broker integration test + in-mem property test).	At-least-once + replay-completeness PROVEN. In-memory + crash-mid-publish: `jammi-db/tests/it/trigger.rs` (`crash_mid_publish_replays_committed_offsets_with_no_loss`, `live_tail_resumes_with_no_loss_after_post_commit_fan_out_failure`, `at_least_once_no_skip_property_over_randomized_states`). Live JetStream consumer-recreate resume: `jammi-db/tests/it/trigger_jetstream.rs` (`consumer_recreate_resumes_engine_offsets_with_no_loss`, gated `live-broker-tests`). Exactly-once is NOT provided by either backend — dedup downstream by the `(_offset, _row_idx)` composite key. At-least-once is bounded by the backing log’s durability (see below): process-crash-durable always; full on Postgres; on SQLite a host power-loss can lose the last committed backing-table row(s) since the previous checkpoint.
Crash mid-publish of a result table	No half-written result table is ever queryable. On restart every table left `building` is reconciled to exactly one terminal state — `ready` if its Parquet is a fully-valid closed file (promoted with the true footer row count, and the ANN sidecar rebuilt from the Parquet so an embedding table self-heals), `failed` otherwise (missing or torn bytes, which are then reaped).	Crash-consistent eventual reconciliation: the bytes are written first, then a single catalog row flips `building → ready`; the startup sweep classifies each `building` orphan against the bytes on disk. The sweep runs cross-tenant (admin scan), so every tenant’s orphan is reconciled and each keeps its own `tenant_id`.	`Recovery: …` `WARN` logs name each reconciled table and its disposition; no row remains `building`.	Proven by `jammi-db/tests/it/recovery.rs` — each torn state (missing bytes, truncated Parquet, valid-but-unfinalized Parquet, finalize-ordering window, ready-but-missing-bytes, two tenants’ orphans) is constructed directly, then the real `recover()` + `load_existing_tables` asserts invariants I1–I6.

Catalog durability under crash vs. power loss

Result-table crash-consistency reconciles whatever the catalog durably retained against the bytes on disk, so the catalog’s own durability setting bounds the guarantee:

Process crash (the engine dies, the host survives): both backends replay their write-ahead log on restart, so a building → ready (or the building insert recovery later reconciles) that committed before the crash is present after it. No committed catalog state is lost.
Host power loss: Postgres commits synchronously (fsync per commit by default), so a committed transaction survives. SQLite runs synchronous=NORMAL under WAL — it fsyncs at checkpoint, not on every commit — so a power loss can lose the last committed transaction(s) since the previous checkpoint. A row that was not durably retained simply isn’t seen by recovery; the bytes it would have pointed at are reaped as an orphan on a later sweep. This is a property of the catalog backend’s durability configuration, not of the reconciliation.

The trigger stream’s at-least-once guarantee inherits exactly this bound, because its durable log is the same kind of backing table written through the same backend transaction. Under a process crash the at-least-once guarantee is unconditional — the committed backing-table rows replay on reconnect. Under a host power loss the durable log itself is power-loss-bounded: on Postgres a committed publish survives; on SQLite (synchronous=NORMAL under WAL) the last committed backing-table row(s) since the previous checkpoint can be lost, and an offset whose row was not durably retained will not replay. At-least-once is therefore full on Postgres and power-loss-bounded on SQLite — never weaker than the backing log’s own durability.

Configuration

Jammi loads configuration from three sources, in priority order:

Config file (TOML) — explicit path, $JAMMI_CONFIG env var, ./jammi.toml, or ~/.config/jammi/config.toml
Environment variables — JAMMI_GPU__DEVICE=0, JAMMI_INFERENCE__BATCH_SIZE=64
Defaults — sensible defaults for all fields

#![allow(unused)]
fn main() {
extern crate jammi_db;
use std::path::Path;
use jammi_db::config::JammiConfig;
fn ex() -> jammi_db::error::Result<()> {
// Load with defaults
let config = JammiConfig::load(None)?;

// Load from a specific file
let config = JammiConfig::load(Some(Path::new("/path/to/jammi.toml")))?;
Ok(()) }
}

Full reference

# Where Jammi stores artifacts (catalog DB, model cache, embeddings)
# Default: platform-specific data directory (~/.local/share/jammi on Linux)
artifact_dir = "/path/to/artifacts"

[engine]
# Number of DataFusion execution threads. Default: number of CPUs.
execution_threads = 8
# Memory limit for the query engine. Default: "75%".
memory_limit = "75%"
# Maximum rows per DataFusion batch. Default: 8192.
batch_size = 8192

[gpu]
# GPU device index. -1 for CPU only. Default: 0.
device = -1
# GPU memory limit. Default: "auto".
memory_limit = "auto"
# Fraction of GPU memory Jammi may use. Default: 0.9.
memory_fraction = 0.9
# Fail fast if the requested GPU is unavailable instead of falling back to CPU.
# Default: false (degrade to CPU with a warning).
require_gpu = false
# Default inference compute precision: "f32" or "f16". A model may override
# this with its own "compute_precision" in config.json; the per-model value
# wins. "bf16" is a valid value for fine-tune's frozen-backbone dtype but is
# rejected at inference load time (not yet supported). Default: "f32".
compute_precision = "f32"

[inference]
# Default backend selection strategy. Default: "auto".
default_backend = "auto"
# Maximum rows per inference batch. Default: 32.
batch_size = 32
# Timeout for batch accumulation in server mode (seconds). Default: 300.
batch_timeout_secs = 300
# Maximum models kept loaded simultaneously. 0 = unlimited. Default: 0.
max_loaded_models = 0

[inference.http]
# HTTP request timeout (seconds). Default: 60.
timeout_secs = 60
# Custom headers for HTTP model endpoints.
[inference.http.headers]
# Authorization = "Bearer sk-..."

[embedding]
# Distance metric for vector indices. Default: "cosine".
default_distance_metric = "cosine"
# Index type for vector storage. Default: "ivf_hnsw_sq".
default_index_type = "ivf_hnsw_sq"
# Rows between embedding index checkpoints. Default: 1000.
checkpoint_interval = 1000

[fine_tuning]
# LoRA rank for fine-tuning. Default: 8.
default_lora_rank = 8
# Learning rate. Default: 0.0002.
default_learning_rate = 0.0002
# Training epochs. Default: 3.
default_epochs = 3
# Training batch size. Default: 8.
default_batch_size = 8
# Checkpoint every N fraction of training. Default: 0.1.
checkpoint_fraction = 0.1

[cache]
# Enable ANN query cache. Default: true.
ann_cache_enabled = true
# Max cached ANN queries. Default: 10000.
ann_cache_max_entries = 10000
# Enable embedding cache. Default: true.
embedding_cache_enabled = true
# Embedding cache size. Default: "1GB".
embedding_cache_size = "1GB"

[server]
# Health probe listen address. Default: "0.0.0.0:8080".
health_listen = "0.0.0.0:8080"
# Arrow Flight SQL listen address. Default: "0.0.0.0:8081".
flight_listen = "0.0.0.0:8081"
# Models to preload on server start. Default: [].
preload_models = ["sentence-transformers/all-MiniLM-L6-v2"]

[logging]
# Log level: "trace", "debug", "info", "warn", "error". Default: "info".
level = "info"
# Log format: "text" or "json". Default: "text".
format = "text"

Environment variable overrides

Every config field can be overridden with an environment variable using the pattern JAMMI_<SECTION>__<FIELD>:

Variable	Overrides
`JAMMI_ARTIFACT_DIR`	`artifact_dir`
`JAMMI_ENGINE__BATCH_SIZE`	`engine.batch_size`
`JAMMI_GPU__DEVICE`	`gpu.device`
`JAMMI_GPU__REQUIRE_GPU`	`gpu.require_gpu`
`JAMMI_INFERENCE__BATCH_SIZE`	`inference.batch_size`
`JAMMI_LOGGING__LEVEL`	`logging.level`

Note the double underscore (__) separating section and field.

Catalog Backend and Trigger Broker

Coordinator to relocate to the docs site (C3) when scaffold lands.

Jammi’s catalog (models, sources, eval runs, mutable companion tables) and trigger broker (provenance channels, evidence streams) are selected through two fields on JammiConfig: catalog and broker. The dev-laptop default is SQLite + an in-process broker; production deployments swap one or both for Postgres + JetStream.

TOML schema

The catalog stanza is a tagged enum keyed by kind:

[catalog]
kind = "sqlite"
# path = "/var/lib/jammi/catalog.db"   # optional; defaults to {artifact_dir}/catalog.db

[catalog]
kind = "postgres"
url = "postgres://user:pass@host:5432/jammi"
pool_size = 16
max_lifetime_secs = 1800

The broker stanza follows the same shape:

[broker]
kind = "in_memory"

[broker]
kind = "jet_stream"
url = "nats://nats.svc:4222"
retention_seconds = 604800
credentials_path = "/var/run/secrets/nats.creds"

broker.kind = "jet_stream" requires the jetstream-broker cargo feature on jammi-db; selecting it without the feature returns JammiError::Config rather than panicking at session construction time.

Environment variable interpolation

JammiConfig::load substitutes ${NAME} patterns from the process environment before TOML parsing. The rules:

${NAME} is replaced by the value of std::env::var("NAME").
A missing variable is an error. The loader never silently substitutes an empty string — that is a common source of “deployed config has an empty Postgres URL” outages.
$$ escapes a literal $.
A bare $ not followed by $ or { is preserved verbatim, so passwords containing a single $ slip through unchanged.
An unterminated ${ returns JammiError::Config.
Interpolation is one-pass and not recursive: ${X}’s value is not re-scanned.

Combined with the tagged-enum shape:

artifact_dir = "/var/lib/jammi"

[catalog]
kind = "postgres"
url = "${POSTGRES_URL}"
pool_size = 16
max_lifetime_secs = 1800

[broker]
kind = "jet_stream"
url = "nats://${NATS_HOST}:4222"
retention_seconds = 604800
credentials_path = "/var/run/secrets/nats.creds"

A working copy of this file ships at crates/jammi-db/examples/sample-postgres.toml.

SQLite vs Postgres trade-offs

Concern	SQLite	Postgres
Operational footprint	One file under `artifact_dir`. No daemon.	Externally-managed Postgres cluster.
Concurrent writers	One; WAL mode lets many readers run alongside one writer.	Many.
Multi-process deployment	Single-process only — sharing the file across `jammi-server` replicas corrupts WAL.	Multi-replica safe.
Failure recovery	File restore from backup.	Standard Postgres point-in-time-recovery.
Pool tuning	None — opens one pool of 8 connections.	`pool_size` + `max_lifetime_secs` honour `sqlx::PgPool` knobs.

For laptop / single-tenant deployments, SQLite is the right answer; the trade-off table tilts to Postgres the moment a second jammi-server replica enters the picture.

In-memory vs JetStream broker

Concern	InMemory	JetStream
Persistence	In-process only; lost on restart.	NATS server retains streams per `retention_seconds`.
Cross-process delivery	None — a publish in process A is invisible to a subscriber in process B.	All subscribers (any process, any host) see every published batch within the retention window.
Auth	None.	Anonymous or NATS `.creds` file via `credentials_path`.
Operational footprint	None.	One NATS server (or cluster).

In-memory is fine for tests, local development, and single-process server deployments where every consumer lives in the same jammi-server process. JetStream is required for any deployment that wants replay across restarts or fan-out across multiple jammi-server replicas.

Health probe

CatalogBackend::ping runs SELECT 1 against the underlying pool and classifies pool failures as BackendError::Unavailable. The /readyz endpoint on jammi-server (when wired) reaches this via session.catalog().ping().await. The primitive is cheap — microseconds against a warm pool — and never opens a transaction.

Format Stability

Jammi persists several on-disk formats. Each one is stamped with a version the writer records and the reader checks, so a file written by a newer build — or by a backend whose serialized layout changed — is rejected as a typed error rather than silently misparsed into wrong data. This page is the operator’s reference for every persisted format the engine owns, what its stability stamp is, and how a reader reacts to a stamp it cannot honour.

The single principle: a reader never guesses. When a stamp is unreadable the load fails loud with a typed error; the upgrade path is to re-emit the artifact from its definition. There is no back-compat reader, no silent downgrade, no default-to-version-1. (Re-emitting is cheap and exact: a result table is the deterministic output of its producing definition over its pinned input anchors — see The Materialization Contract.)

The per-format table

Format	On-disk file	Stability stamp	Reject semantics on load
Materialization manifest	`.materialization.json`	`manifest_version` (`u32`)	Reject-newer — `found > MANIFEST_VERSION` → `ManifestError::UnsupportedManifestVersion`
ANN row map	`.rowmap`	leading `u32` version header	Reject-newer — `found > ROWMAP_VERSION` → `JammiError::IncompatibleFormat { artifact: "rowmap", .. }`
ANN sidecar manifest	`.manifest.json`	`version` (`u32`)	Reject-newer — `found > ANN_MANIFEST_VERSION` → `JammiError::IncompatibleFormat { artifact: "ann-manifest", .. }`
ANN binary threshold companion	`.threshold`	none embedded — required whenever the sidecar manifest’s `scalar_kind` is `Binary`, confirmed by the manifest’s `binary_threshold_kind` field	Fail-loud, not versioned — a missing `binary_threshold_kind`, a missing file, or a byte length not matching `dimensions` `f32`s → `JammiError::Other`
USearch ANN graph	`.usearch`	`backend_version` stamped in the sidecar `.manifest.json`	Strict — any mismatch with the linked USearch → `JammiError::IncompatibleFormat { artifact: "usearch-index", .. }`
Lexical (BM25) index	tantivy index dir	tantivy’s own format tag	Library-loud — tantivy’s `Index::open` fails with `IncompatibleIndex`, surfaced as `JammiError::Lexical`
Result-table data	`.parquet`	none embedded — its format-of-record version is the `.materialization.json` `manifest_version`	Schema-shape checked at read via `JammiError::Schema`; byte integrity caught by `verify_materialization`

Two distinct kinds of stamp appear above, and the difference is deliberate:

Reject-newer for formats that carry a compatibility ordering. An older or equal version is readable by construction (the layout only grew); only a newer version carries a layout this build does not know. This is the materialization manifest’s idiom (MaterializationManifest::from_json_bytes), and the .rowmap and ANN .manifest.json follow it.
Strict for the USearch backend_version, because the USearch serialized graph format carries no compatibility ordering between releases. A version that differs at all may mis-deserialise the graph and return wrong neighbours, so any inequality is incompatible — there is no “older is fine” here.

Materialization manifest — reject-newer

.materialization.json carries manifest_version (MANIFEST_VERSION). The reader rejects a newer version as the typed ManifestError::UnsupportedManifestVersion { found, supported }. This is the gold idiom every other stamped format is modeled on; the full contract is in The Materialization Contract. Its error lives in its own domain (ManifestError) and is intentionally not folded into the shared IncompatibleFormat variant — it carries the manifest-specific recovery semantics the contract describes.

ANN row map (`.rowmap`) — reject-newer

The .rowmap is the engine-owned mapping from a USearch internal id to the Jammi _row_id string. It is a small binary file: a leading u32 version header, then length-prefixed UTF-8 entries. On load the reader checks the header and rejects a version greater than ROWMAP_VERSION as JammiError::IncompatibleFormat { artifact: "rowmap", found, supported }.

ANN sidecar manifest (`.manifest.json`) — reject-newer + strict backend

The ANN sidecar’s .manifest.json records the index metadata: version, dimensions, backend, backend_version, scalar_kind, count, the file names, and the creation instant. On load it is deserialised as a typed struct (mirroring MaterializationManifest::from_json_bytes), never by field-by-key serde_json::Value lookups. The determinants of a safe load — version, dimensions, backend_version, and scalar_kind — are all required: a manifest missing any of them is a hard decode error, never silently defaulted to a guess. A fifth field, binary_threshold_kind, is conditionally required: legitimately absent for every non-Binary scalar_kind, but its absence on a Binary manifest with at least one row is itself a hard error — a torn or pre-threshold-fix bundle, not a legitimate empty state.

Two checks run on the deserialised manifest:

version, reject-newer. A version greater than ANN_MANIFEST_VERSION is JammiError::IncompatibleFormat { artifact: "ann-manifest", .. }.
backend_version, strict. The stamped USearch version is compared for exact equality against the linked jammi_db::index::backend_version(). Any mismatch is JammiError::IncompatibleFormat { artifact: "usearch-index", .. }.

A Binary scalar_kind additionally loads the .threshold companion beside the bundle — the per-dimension threshold τ the sidecar’s sign-packing was fit against — validated against dimensions and the manifest’s binary_threshold_kind rather than a stamp of its own (see the per-format table above).

USearch ANN graph (`.usearch`) — strict backend version

The .usearch file is USearch’s own serialized HNSW graph. Its serialized header carries only the major version and gives no cross-release compatibility guarantee, so a USearch upgrade can change the on-disk layout in a way that deserialises into a structurally-valid-but-wrong graph — returning incorrect nearest neighbours with no error. To close that silent-corruption path, the engine stamps the full linked USearch version (backend_version) into the sidecar .manifest.json at save and strict-compares it on load. The graph itself is never trusted across a backend version change; the only safe action is to re-emit the embedding table (which rebuilds the sidecar).

Lexical (BM25) index — library-loud

The lexical retrieval sidecar is a tantivy index directory. Tantivy stamps its own format version and refuses to open an index written by an incompatible release: Index::open returns IncompatibleIndex, which the engine surfaces as JammiError::Lexical. The engine adds no stamp of its own here — the library is already loud, so a second stamp would be redundant machinery. The recovery is the same: re-emit (re-index) the table.

Result-table Parquet — no embedded stamp, by design

The result-table Parquet object carries no embedded format version. Its format-of-record version is the manifest_version of the .materialization.json sidecar written beside it: the manifest is the artifact’s identity, and the Parquet bytes are its subject. A reader does not need a second, in-band version because:

Shape safety is enforced at read time by the typed Arrow downcast. Every vector read goes through store::vectors::extend_with_fixed_size_list_f32, the single place in the engine that downcasts a vector column to FixedSizeList<Float32>; a missing column, a wrong Arrow type, or a non-Float32 inner type is a typed JammiError::Schema, not a panic. Schema shape is checked from the data itself, so it needs no stamp.
Byte integrity is content-addressed. The Parquet object is immutable and identified by its ArtifactDigest (SHA-256 over the bytes) recorded in the manifest, so any out-of-band byte change is caught by verify_materialization recomputing the digest — see The Materialization Contract.

The manifest-bypass Parquet read paths

Three engine read paths open a result-table Parquet object directly, without first reading the .materialization.json manifest:

Session::read_vectors — streams the whole vector column of an embedding table into one Vec<f32> per row.
Session::read_vector_by_key — extracts a single row’s vector by its _row_id (the resolver behind search_by_id’s query-by-example path).
store::register_parquet_table — registers a Parquet URL as a DataFusion table under jammi.{name} for SQL scans.

These paths do not consult a format stamp, and that is correct: their safety rests entirely on the typed JammiError::Schema downcast in store::vectors::extend_with_fixed_size_list_f32, which validates the on-disk Arrow schema shape directly from the data. A Parquet object whose schema does not match what the read expects produces a typed Schema error, regardless of how it was written. Out-of-band byte tampering on these immutable, content-addressed objects is the verify_materialization digest check’s concern, not a per-read stamp’s.

Upgrade path: re-emit

For every stamped format above, the recovery from an incompatible stamp is the same — re-emit the artifact from its definition. The engine ships no back-compat reader and no in-place migrator: an ANN sidecar is rebuilt by re-running the embedding producer, a tantivy index by re-indexing, a result table by re-running its producing definition over its input anchors. Because a result table is the deterministic output of a producing definition over pinned inputs, re-emission is exact, not lossy. The typed rejection is the signal to re-emit; it is never something to paper over with a default.

API Stability

Jammi exposes a deliberate, frozen public surface. This page is the operator’s reference for what is stable, what semver promise covers it, and how the freeze is enforced — not as prose anyone can let drift, but as a CI guard that reds the moment a stable surface changes shape.

The single principle: a stable surface does not change under you without a major. A verb is not renamed, an rpc is not dropped, a wire package is not removed, and a persisted-format version is not reinterpreted, across any release that does not bump the major version. The surfaces below are the ones that promise holds for; everything not listed here is internal and may move.

The frozen stable surfaces

Three surfaces are frozen. Each is machine-checked against a committed baseline, so the freeze is enforceable rather than aspirational (see Enforcement).

1. The verb set — the call surface

The public verb vocabulary a caller invokes — identical name-for-name and signature-for-signature across the embedded (jammi.EmbeddedBackend) and remote (jammi.RemoteDatabase) transports. It is pinned, set-by-set, in crates/jammi-python/tests/test_conformance.py; those sets are the frozen verb list:

Verb set (conformance constant)	Verbs
`_REMOTE_VERBS`	`add_source`, `generate_embeddings`, `encode_query`, `search`, `sql`, `list_sources`, `describe_source`, `set_tenant`, `tenant_scope`, `tenant`, `get_server_info`
`_TRAINING_VERBS`	`fine_tune`, `fine_tune_graph`, `train_context_predictor`, `predict_with_context_predictor`
`_INFERENCE_VERBS`	`infer`
`_PIPELINE_VERBS`	`build_neighbor_graph`, `propagate_embeddings`, `asof_join`, `assemble_context`, `recompute`, `verify_materialization`, `staleness`, `derives_from`
`_EVAL_VERBS`	`eval_embeddings`, `eval_per_query`, `eval_inference`, `eval_compare`, `eval_calibration`
`_CHANNEL_VERBS`	`register_channel`, `add_channel_columns`, `list_channels`
`_NUMERIC_VERBS`	`conformalize`, `conformalize_interval`, `conformalize_cqr`, `rrf_fuse`
`_MUTABLE_TOPIC_VERBS`	`create_mutable_table`, `drop_mutable_table`, `list_mutable_tables`, `register_topic`, `drop_topic`, `list_topics`, `publish_topic`, `subscribe_collect`
`_LIFECYCLE_VERBS`	`list_models`, `describe_model`, `delete_model`
`_SEARCH_VERBS`	`search` (pinned separately for the `embedding_table=` selector)

The conformance suite is the enforced annotation: removing or renaming a verb, or changing its signature on either transport, reds the suite. Jammi does not carry a per-pub-item #[stable] rustdoc attribute — Rust has no such attribute, and history-bearing version markers in rustdoc are explicitly disallowed — so the conformance sets carry the freeze that a #[stable] pass would carry elsewhere.

2. The wire contract — `package jammi.v1.*`

The gRPC/Flight SQL wire surface is the ten jammi.v1.* proto packages (nine served by the OSS engine; jammi.v1.lifecycle is a contract-only surface — defined in the wire descriptor so the candle-free client can call a platform server that implements it, but answered by no OSS handler):

Package	Surface
`jammi.v1.audit`	provenance / audit log rpcs
`jammi.v1.catalog`	sources, models, channels, tenant, server-info, mutable tables, topics
`jammi.v1.embedding`	embedding generation, query encode, search
`jammi.v1.error`	the typed wire-error message (no rpcs)
`jammi.v1.eval`	the evaluation rpcs
`jammi.v1.inference`	bulk inference + predict
`jammi.v1.lifecycle`	license apply / bootstrap / status / login — contract-only, answered by a platform server (the OSS engine returns `UNIMPLEMENTED`)
`jammi.v1.pipeline`	graph / context / as-of / recompute / materialization rpcs
`jammi.v1.training`	training submit + status
`jammi.v1.trigger`	topic publish + subscribe

The contract is the full set of (Service, Method) rpc paths these packages define — decoded from the compiled FILE_DESCRIPTOR_SET, the authoritative machine-readable description of the frozen wire surface, which may exceed what a given build mounts (jammi.v1.lifecycle is defined here yet served by no OSS handler), not a hand-maintained list. The v1 in the package path is the wire-stability stamp: a breaking change to a message or an rpc shape requires a jammi.v2.* package, not an in-place edit of v1.

3. The persisted-format versions

The on-disk format-of-record versions, each a writer-stamped, reader-checked version with reject-newer (or strict) semantics — the full contract is on the Format Stability page:

Format	Stamp	Current version
Materialization manifest (`.materialization.json`)	`MANIFEST_VERSION`	`3`
ANN row map (`.rowmap`)	`ROWMAP_VERSION`	`1`
ANN sidecar manifest (`.manifest.json`)	`ANN_MANIFEST_VERSION`	`3`
Catalog schema	append-only migration ledger	through `023`

The catalog migration ledger is append-only: a migration is never edited or removed once shipped, only a new numbered migration is appended. The other three stamps follow the reject-newer idiom — a newer stamp than this build knows is a typed rejection, never a silent misparse.

The semver commitment

This release is the terminal 0.x engineering bar: the three surfaces above are frozen, and a breaking change to any of them — a renamed/removed verb, a dropped rpc, a removed jammi.v1.* package, an incompatible reinterpretation of a persisted-format version — does not ship without a major version bump. New additive surface (a new verb, a new rpc, a new appended migration) may land in a minor; it does not break a caller written against the frozen set, because it only grows the surface.

Concretely:

A new verb is added to a conformance set in the same PR that adds the verb, on both transports — additive, minor-compatible.
A removed or renamed verb is a breaking change — major only.
A new rpc is a new (Service, Method) path appended to the wire baseline — additive. A removed/renamed rpc, or a removed jammi.v1.* package, is breaking — major only, or a jammi.v2.* package for a message-shape break.
A persisted-format version is bumped only when the layout changes; the reject-newer guard then makes an old reader fail loud rather than misparse, and the recovery is to re-emit (see Format Stability).

Experimental surfaces

There are none. Every public verb in the conformance sets, every jammi.v1.* rpc, and every persisted-format version above is frozen — none is marked provisional or experimental, and none ships behind an “unstable” flag. A surface that is not yet ready to freeze does not appear on the public client at all; it stays internal until it is ready to enter the frozen set. The freeze is total across the published surface, which is what the terminal-0.x bar requires.

Enforcement: the freeze-guard

The freeze is a CI guard, not a promise in prose. Two checks run on every PR:

The wire contract + manifest version are pinned in a Rust integration test (crates/jammi-server/tests/it, the api_freeze module). It decodes FILE_DESCRIPTOR_SET into the live (Service, Method) rpc set and the live jammi.v1.* package set, and asserts they equal a committed frozen baseline; it also asserts MANIFEST_VERSION equals its frozen value. The test derives the live surface from the compiled descriptor — the same source the server actually serves — so a divergence between the served surface and the baseline cannot hide.
The verb set is pinned in the conformance suite (crates/jammi-python/tests/test_conformance.py), which asserts every verb in every set is callable with an identical signature across both transports.

Removing or renaming a stable rpc reds the Rust guard: the live (Service, Method) set decoded from the descriptor no longer equals the committed baseline, and the assertion fails naming the rpc that disappeared (or the one that appeared without a baseline update). Removing or renaming a verb reds the conformance suite the same way. The freeze has teeth because the baseline is a committed artifact a change must explicitly and visibly edit — and editing it to drop a stable surface is exactly the breaking change the semver commitment forbids outside a major.

Security Posture

Jammi is an engine on a trusted network, not a security boundary. This page is the published threat model: precisely what the engine defends, what it explicitly does not, the trusted-network assumption every deployment inherits, and the consumer’s responsibilities for the boundary the engine deliberately does not own. Every “defends” line below traces to a real test or code path; every “does not” line is the honest absence of a guarantee the engine never claims.

The single principle, stated once and not softened: Jammi authenticates nothing. Identity, authorization, and the network perimeter are a consumer’s vocabulary; the engine ships the seam a consumer plugs them into, never the policy. A deployment that exposes the engine’s port to an untrusted caller has removed the boundary the engine assumes is there.

What the engine defends

Each line names the mechanism and the test or code path that proves it.

Defence	Mechanism	Traces to
Format-version reject-newer	A persisted artifact stamped with a version newer than this build knows is a typed rejection, never a silent misparse into wrong data	`manifest.rs` (`UnsupportedManifestVersion`, the `read`-path guard) + test `newer_manifest_version_is_rejected`; `sidecar.rs` (`IncompatibleFormat`) + test `newer_rowmap_version_is_rejected`
Tenant-scope filtering on every catalog query	The read-side analyzer injects `tenant_id = $current OR tenant_id IS NULL` on every scan; every `register_` and the mutable-table sink calls `assert_tenant_matches` before INSERT; the backend SQL layer also carries the predicate. A Jammi-owned result table carries no `tenant_id` column (it is wholly owned by one tenant, or GLOBAL), so a tenant-gating result-table schema provider gates resolution* on the catalog owner instead — over every lane (Flight `db.sql`, gRPC `sql`, search) a correctly-bound tenant resolves only its own and GLOBAL result tables; a peer’s private table resolves not-found	`tenant_scope.rs` (`TenantScopeAnalyzerRule`) + `store::result_schema` (`ResultTableSchemaProvider`) + the catalog repos’ `assert_tenant_matches`; proven across the verb surface by `tenant_isolation_oracle.rs::every_case_isolation_holds` (every wire rpc covered, asserted by `every_rpc_is_covered`)
Typed error surfaces	Failures are typed variants with stable wire status, not opaque strings: a wrong on-disk shape is `JammiError::Schema`, a stale tenant write is `BackendError::TenantMismatch`, an incompatible format is `JammiError::IncompatibleFormat`	the typed-error definitions per crate; the Format Stability reject paths; the tenant write-guard
The BYO-auth resolver seam (covers every transport)	Tenant binding is uniformly resolver-driven: a consumer composing the engine via `assemble_grpc_chain` supplies its own `TenantResolver` (async: request metadata → `TenantScope`), which the single async tenant-binding tower layer applies to every engine gRPC verb AND the Flight SQL `db.sql` lane (`TenantBoundProvider` drives the same resolver). One authenticating resolver, plugged in once, authenticates both transports — closing the cross-transport gap where the gRPC plane was authenticated but Flight bound from the unauthenticated `jammi-session-id` header (#220, now closed). The seam is the same one downstreams compose with; the BYO-auth seam and the composability seam are one seam. The engine still authenticates nothing on its own — the default resolver (`SessionIdTenantResolver`) binds the tenant a caller asserts via `jammi-session-id`; supplying an authenticating resolver is the consumer’s job	the seam proof `composability_seam.rs::resolver_seam_scopes_both_transports_and_rejects_missing_credential` (gRPC + Flight isolation and `UNAUTHENTICATED`-on-missing across both transports) and the mirror `grpc_byo_auth.rs::resolver_seam_binds_the_engine_and_rejects_missing_credential`; the lower-level custom-`Interceptor`-in-front pattern (for fronting your own single service) is pinned by the four guarantees below

The BYO-auth seam’s contract is pinned by grpc_byo_auth.rs as a worked example. The resolver-seam tests above prove it through the engine’s own composability seam across both transports; the custom-interceptor-in-front form (a consumer fronting its own single service) additionally pins four guarantees, each its own test:

Missing credential → unauthenticated. No token fails the request before any handler runs, so the caller reads nothing — it does not fall through to an unscoped read (missing_credential_is_rejected_not_run_unscoped).
Forged claim → unauthenticated. A token whose signature does not cover its tenant claim is rejected; a forged tenant buys nothing because the signature covers the claim (invalid_credential_is_rejected).
A rejected caller does not fall through. The interceptor fails the request rather than binding None — there is no path by which an unauthenticated caller silently reads another tenant’s rows (the same test proves a valid token for the very tenant the forgery claimed does resolve, so the rejection was the signature, not a tenant blocklist).
Per-tenant isolation through the seam. Two callers presenting valid tokens for two distinct tenants each see only their own tenant’s sources, end to end through the authenticating interceptor (two_authenticated_tenants_see_isolated_sources).

What the engine explicitly does NOT defend

These are honest absences. The engine never claims them; a deployment that needs them supplies them above the engine.

It authenticates nothing. There is no built-in credential check on any verb. The default SessionIdTenantResolver reads the jammi-session-id header and binds the tenant the caller asserts (or the explicit Global/unscoped scope when none is bound) — it verifies nothing about who the caller is. A deployment that needs authentication supplies its own TenantResolver at the seam.
It ships no authz / RBAC / SSO. There is no role model, no permission check, no policy engine, no identity-provider integration. Authorization is a consumer’s vocabulary and lives above the seam.
jammi-session-id is a correlation id, NOT a credential or principal. It is a client-minted, opaque transport correlation id identifying a connection, not a person. Anyone who presents another session’s id assumes that session’s tenant. It is never an authentication or authorization boundary.
No TLS / secrets / IAM. Transport encryption, secret management, key rotation, and cloud IAM are the consumer’s runtime, not the engine’s — the same line the Design Philosophy draws around load balancing, ingress, and orchestration.

The trusted-network assumption

Every Jammi deployment that uses the default SessionIdTenantResolver assumes a trusted network: a private VPC, a sidecar mesh, or a single-process embedding where every caller is already inside the trust boundary. On that network, binding the tenant a caller asserts via jammi-session-id is the right, low-friction trade-off. The moment an untrusted caller can reach the port, that trade-off is wrong — and closing it is the consumer’s job, via an authenticating TenantResolver at the seam, not a flag the engine flips.

Tenant scope is an organizational mechanism, not an access-control boundary

This distinction is load-bearing. Tenant-scope filtering (above) is an organizational mechanism: it keeps one tenant’s catalog rows from appearing in another tenant’s correctly-bound reads, so a multi-tenant deployment stays tidy and a buggy caller that writes the wrong tenant_id is refused by assert_tenant_matches. It is not an access-control boundary: it does not decide which tenant a caller is entitled to act as. Nothing in the engine prevents an unauthenticated caller from asserting any tenant it likes via jammi-session-id and reading that tenant’s rows. Access control — proving a caller may act as the tenant it claims — is exactly what the BYO-auth seam adds in front of the scope mechanism. Treating tenant scope as if it were authorization is the misuse this page exists to forestall.

The consumer’s responsibilities

To put a real tenant boundary in front of untrusted callers, a consumer supplies the authentication and authorization the engine deliberately omits by implementing a TenantResolver and passing it to assemble_grpc_chain — one plug that binds every engine gRPC verb and the Flight db.sql lane through the single tenant-binding mechanism (the tenant recipe and the worked example in grpc_byo_auth.rs):

Authenticate the principal. In resolve, read and verify the caller’s credential (a bearer token, an exchanged session cookie, a service-to-service token). A missing or invalid credential returns Err(Status::unauthenticated) here, before any handler runs.
Authorize the tenant from the verified claim. Derive the tenant from the verified claim — never from a header the caller controls. This is where the consumer’s policy lives: which tenant this principal may act as. Return Ok(TenantScope::Tenant(t)).
The engine binds it. The async tenant-binding layer maps the resolved scope onto the SessionTenant request extension every verb handler resolves, and TenantBoundProvider binds it for Flight — the consumer writes only resolve.

Because resolve runs in front of every handler, the tenant the engine acts on is the one the credential proves, not one the caller asserts. Reject, don’t default: an authenticating resolver returns Tenant/Err and NEVER TenantScope::Global — returning Global (or, in the lower-level interceptor-in-front form, binding None) on a failed check runs the request unscoped, which for a tenant_id IS NULL-bearing catalog is a global read, so a rejected caller must fail the request. TenantScope::Global is the explicit unscoped choice the default (OSS-cooperative) resolver returns when no tenant is bound — never a value a rejection falls through to. That framing rule is the defect grpc_byo_auth.rs’s missing_credential_is_rejected_not_run_unscoped and the seam mirror resolver_seam_binds_the_engine_and_rejects_missing_credential guard against.

Dependency-advisory posture

The engine’s dependency tree is gated in CI by cargo deny against the RustSec advisory database, plus a license allowlist and the source/ban guards that formalize the engine’s one-way dependency direction (no proprietary or non-crates.io crate in the OSS closure). The advisory lane runs the live RustSec DB on every PR; a documented exception in deny.toml records any advisory the release knowingly carries, with a written rationale, rather than destabilizing the freeze with a risky bump. The config is deny.toml at the repo root.

Performance SLOs

Jammi’s performance contract is throughput and coverage, gated against committed baselines — not latency. Each scale-relevant engine verb commits a measured rate (or, for the recall tier, a portable recall fraction) on a named reference box, and a regression gate fails when a fresh run falls more than a fixed fraction below it. This page is the operator’s reference for every gated target: the verb, the named scale it is measured at, the committed baseline, the relative-drop threshold, and the box the baseline was emitted on.

How the gate works

A measured rate must not fall more than the relative-drop threshold below its committed baseline. The threshold derives an absolute floor from the baseline — floor = baseline · (1 − threshold) — and the gate is a >= against that floor (measured >= floor), never an equality and never a bit-compare. The single threshold is 30% (DEFAULT_REGRESSION_THRESHOLD), defined once in the harness. It is generous on purpose: the load-bearing failure this gate exists to catch is a structural regression — an algorithm that went quadratic, a lock that serialized a parallel path, a dropped fast path — which collapses throughput by far more than a third. A tighter threshold would trade that real signal for false alarms on runner noise.

The gate fails closed: a non-finite or non-positive baseline cannot anchor a relative gate, so it fails (it never vacuously passes against a meaningless baseline). Each *-scale bench subcommand maps its verdict to its process exit code — a regression exits non-zero — which is what the CI lanes assert.

Where the gate runs

Lane	Trigger	Blocking?	Purpose
`ci.yml` (workspace tests)	every PR	yes	Gates a property of the mechanism: `committed_baseline_gates_with_teeth` proves the committed baseline is a well-formed, generously-thresholded gate that can fail. It does not re-measure the rate on the contended PR runner.
`perf.yml`	nightly `schedule:` + `workflow_dispatch`	no (early-warning)	Runs every `*-scale` tier’s measured-rate gate on a real box, so a structural regression surfaces between releases. Non-blocking because the 30% band was sized for a same-box manual emit, not a contended shared runner — a required per-PR rate gate would flap and rot.
`crates.yml` (`perf-gate`)	`v*` release tag	yes	The authoritative same-box-ish gate: `publish` depends on it, so a structural perf regression on the release tag blocks the crates.io publish and the GitHub release.

The gated targets

Each row is one gated verb at one named scale. The rates are same-box throughputs; the recall row is a portable fraction. Every committed number is a real, re-derivable fold — a rebuild-* bench subcommand reproduces it on the emit box.

Verb	Bench tier	Named scale	Committed baseline	Threshold	Gated quantity
`fine_tune`	`train-scale`	1 536 in-batch-negative pairs, one GradCache backward + AdamW step, `Device::Cpu`	180.0 pairs/s	30% rel. drop	throughput (pairs/s)
`fine_tune_graph`	`graph-train-scale`	8 communities × 64 nodes, biased-walk sampler (walk length 4, 4 walks/node)	6 418.1 pairs/s	30% rel. drop	sampled-pairs/s throughput (+ a portable determinism digest)
`train_context_predictor`	`context-predictor-scale`	CNP over 8 tasks × 18 rows, 30 epochs	21.29 episode-steps/s	30% rel. drop	meta-training throughput (+ a portable predict digest)
`generate_embeddings`	`model-inference-scale`	16 rows over a tiny 32-dim 1-layer BERT bundle, `Device::Cpu`	333.6 rows/s	30% rel. drop	coarse serving throughput (+ a portable embed digest)
`infer` (classification)	`model-inference-scale`	16 rows over a tiny 32-dim 1-layer ModernBERT classifier bundle, `Device::Cpu`	207.0 rows/s	30% rel. drop	coarse serving throughput (+ a portable infer digest)
`search` + `build_neighbor_graph`	`arxiv`	2 000-row corpus slice, 100 held-out 768-dim queries (frozen sidecar)	recall@{1,10,100} = {1.0, 1.0, 0.997}	floor = measured − 0.04 (absolute margin)	portable recall fraction (not a rate) — `measured >= floor`

The reference box

The committed rate baselines were emitted on this box, in the release profile, with RAYON_NUM_THREADS=1:

Property	Value
Logical CPUs	8
Total RAM	31 720 MiB (~31 GiB)
Profile	`release`
Engine version when committed	`0.30.0`

A baseline is refreshed by hand (via the tier’s rebuild-* subcommand) when the emit box changes; the version-stamped report lets a downstream gate reject a cross-version comparison.

The same-box caveat

A committed rate is not a portable floor. Stated verbatim from the gate’s own definition:

A rate (throughput, QPS, pairs/s) is not portable the way the recall fraction is — it is a property of the box that produced it, so a committed rate baseline is a same-box reference, refreshed by hand when the emit box changes, not a number a different machine can re-derive.

What stays portable is the shape of the gate (a measured rate must not fall more than a fixed fraction below the committed baseline) and the determinism digests and the recall fraction, which any box re-derives bit-for-bit. So the rate rows above are meaningful only against the reference box; do not read them as a throughput your hardware must hit. The release-tag gate is the authoritative reading because it runs on a same-box-ish runner; the nightly lane is early-warning, not a portable promise.

Why no latency SLOs

The contract is throughput and coverage, not latency. A latency SLO on a shared CI runner flaps — tail latency on a contended box is dominated by co-tenant load, not by the engine’s code path — so a latency gate would either flap (set tight) or never bite (set loose), exactly the failure mode the relative-drop rate threshold is designed around. Latency is therefore out of scope here. The representative full-scale serving numbers (the GPU-model rates that latency would ride on) are captured off-box in the cookbook’s A/B split, not gated in CI.

Keyboard shortcuts

Jammi AI Guide