Runnable Recipes

Every recipe under cookbook/recipes/ ships as a runnable example.py next to a markdown README and is wired into CI via tests/cookbook_smoke.py — a broken recipe blocks the merge. These recipes are the OSS source of truth; this page mirrors each README below.

For the long-form, measured companion — The Cookbook, a Theory↔Computation book that shows one recipe = one equation in the graph-signal-processing monograph = one line of the GNN canon, executed against committed goldens — see The Cookbook. The two are complementary: these How-To Guides are the short, dual-language (Rust + Python), compile-tested “how do I call this verb” reference; The Cookbook is the long-form, Python, executed-and-measured narrative.

The recipes shipped at MVP:

Recipe	Demonstrates
`mutable_tables`	Create/insert/select/drop on a mutable companion table
`trigger_streams`	Publish + subscribe on a topic via the in-process broker
`eval_embeddings`	recall@k, MRR, nDCG against a golden set
`image_search`	Image-to-image search with PatentCLIP + Recall@K / MRR eval
`eval_inference`	Accuracy + macro F1 against gold labels
`eval_inference_ner`	Entity-level precision / recall / F1 against gold spans
`fine_tune`	LoRA fine-tune end-to-end
`flight_sql`	Query a remote `jammi-server` over Arrow Flight SQL
`audio_search`	Audio-to-audio search with a CLAP encoder
`search_audit`	Per-query provenance audit of a search
`session_lifecycle`	Ephemeral session storage with scoped cleanup

Mutable tables

End-to-end create / insert / select / drop on a Jammi mutable table — the OSS primitive for state that needs to live alongside read-only result tables.

When to use this pattern. You need a writable table that sits in the same SQL catalog as your registered sources and embedding tables — for caching enriched rows, holding cursor state, recording user feedback, or any “small table I want to UPDATE / DELETE / INSERT from SQL” workload — without standing up an external Postgres.

What `example.py` does

Connects to a temporary artifact dir
Creates a notes mutable table with an int64 primary key + utf8 body column
Inserts three rows through DataFusion DML (INSERT INTO ...)
Verifies count and ordering via SELECT
Drops the table, then asserts a SELECT after the drop raises
Demonstrates the idempotent drop_mutable_table(..., if_exists=True)

API surface exercised

Database.create_mutable_table(name, *, schema, primary_key, ...)
Database.sql("INSERT INTO mutable.public.<name> ...")
Database.sql("SELECT ... FROM mutable.public.<name>")
Database.drop_mutable_table(name, *, if_exists=False)

The DataFusion namespace for mutable tables is always mutable.public.<name> — distinct from registered sources, which live under <source>.public.<source>.

Run it

python cookbook/recipes/mutable_tables/example.py

Exits 0 on success, prints mutable_tables: OK on the last line.

Trigger streams

End-to-end publish + subscribe on a Jammi topic, plus the registration and listing surface. Uses the embedded in-process broker — no NATS or external broker needed.

When to use this pattern. You need a low-friction event bus inside your application — for fan-out to downstream consumers, fan-in from batch jobs, or replay-from-offset semantics — without bringing up Kafka or NATS in dev/test. The same surface scales out to NATS JetStream by flipping a config flag at deploy time.

What `example.py` does

Connects to a temporary artifact dir
Registers a topic events.demo with a typed schema and broker metadata
Confirms list_topics() returns the new topic
Publishes a 3-row batch through publish_topic — captures the broker-assigned offset
Subscribes from from_offset=0 and round-trips the same rows back
Drops the topic, confirms it’s gone from list_topics()
Demonstrates idempotent drop_topic(..., if_exists=True) and strict-mode failure when dropping a missing topic

API surface exercised

Database.register_topic(name, *, schema, broker_metadata=None)
Database.list_topics()
Database.publish_topic(name, *, batch) — returns the assigned offset
Database.subscribe_collect(name, *, from_offset, max_batches)
Database.drop_topic(name, *, if_exists=False)

The subscribe_collect path drives the replay-from-backing-table flow when from_offset=0; the live-tail flow is exercised in the broker integration suite.

Run it

python cookbook/recipes/trigger_streams/example.py

Exits 0 on success, prints trigger_streams: OK on the last line.

Evaluate retrieval quality

Measure recall@k, precision@k, MRR, and nDCG of an embedding index against a golden relevance set.

When to use this pattern. You have a corpus and a small set of (query, expected document) judgments, and you need a number that tells you “is my new encoder better than the one I shipped last month?” The same loop powers nightly regression dashboards and A/B model comparison.

What `example.py` does

Connects to a temporary artifact dir
Registers the tiny corpus as a Parquet source
Builds 32-dim embeddings over the content column with the local tiny_bert fixture
Reads cookbook/fixtures/tiny_golden.json, expands it into the (query_id, query_text, relevant_id) CSV shape eval_embeddings consumes, and registers it as a golden source
Calls db.eval_embeddings(source="corpus", golden_source="golden.public.golden", k=5)
Asserts each aggregate metric is in [0.0, 1.0] and the per-query records carry their golden-set query_id

API surface exercised

Database.generate_embeddings(*, source, model, columns, key, modality="text")
Database.eval_embeddings(*, source, golden_source, model=None, k=10)

The returned dict carries aggregate (mean across queries — recall_at_k, precision_at_k, mrr, ndcg) and per_query (one entry per query with query_id and a metrics sub-dict of the same four names, un-averaged).

Golden source shape

eval_embeddings requires a registered source with these columns:

column	type	example
`query_id`	utf8	`q1`
`query_text`	utf8	`quantum computing applications`
`relevant_id`	utf8	`1` (matches `corpus.id` as a string)

Image queries are supported via a query_image BLOB column instead of query_text; cross-modal eval is out of scope for this recipe.

Run it

python cookbook/recipes/eval_embeddings/example.py

Exits 0 on success, prints the metrics dict + eval_embeddings: OK.

Image search

Run image-to-image semantic search over a corpus with an OpenCLIP-format vision model, then measure retrieval quality.

When to use this pattern. You have a corpus of images (figures, drawings, photos) and want to find the ones most similar to a query image — and a number that tells you how good the retrieval is. This is the image counterpart of the text eval_embeddings recipe.

Flow

Load a small image corpus (inline image bytes in a Parquet source)
Generate L2-normalized vision embeddings over the image column
Search the index with an encoded image query (cosine ANN)
Eval retrieval quality (Recall@K / MRR) against a held-out golden set

Model

The example uses PatentCLIP as the reference model — it is the federal patent-figure-search use case driving this recipe:

JAMMI_IMAGE_MODEL=patentclip/PatentCLIP_Vit_B \
    python cookbook/recipes/image_search/example.py

patentclip/PatentCLIP_Vit_B is pulled from the Hugging Face Hub on first use and produces 512-dim L2-normalized embeddings. Any OpenCLIP-format model works the same way — OpenAI CLIP, LAION CLIP-ViT-B-32-*, EVA-CLIP, etc. — the encoder is auto-detected from the model’s open_clip_config.json.

By default (no env var) the recipe runs against the hermetic cookbook/fixtures/tiny_open_clip fixture so it runs offline in CI in under a few seconds. That fixture has random weights, so its retrieval numbers are meaningless — it exercises the full pipeline, not model quality. Use PatentCLIP (or any real model) for real numbers.

What `example.py` does

Connects to a temporary artifact dir
Reads the 20 committed 224×224 PNGs under cookbook/fixtures/tiny_image_corpus/ into a Parquet corpus source (image_id, image bytes)
db.generate_embeddings(source="corpus", model=MODEL, columns=["image"], key="image_id", modality="image")
db.encode_query(model=MODEL, query=png_bytes, modality="image") → db.search("corpus", query=vec, k=5) (returns a pyarrow.Table)
Builds the image-query golden source from tiny_image_golden.json and calls db.eval_embeddings(source="corpus", golden_source="golden.public.golden", k=5)
Prints the aggregate Recall@K / precision@K / MRR / nDCG and the per-query records. It reports the metrics; it does not assert a quality bar.

Stepwise scripts

example.py runs all four phases in one process (this is the version wired into tests/cookbook_smoke.py). The numbered scripts decompose the same flow and share a persistent workdir, so run them in order:

python cookbook/recipes/image_search/01-load-corpus.py
python cookbook/recipes/image_search/02-generate-embeddings.py
python cookbook/recipes/image_search/03-search.py
python cookbook/recipes/image_search/04-eval.py

API surface exercised

Database.generate_embeddings(*, source, model, columns, key, modality="image")
Database.encode_query(*, model, query, modality="image") → list[float]
Database.search(source, *, query, k, filter=None, select=None) → pyarrow.Table
Database.eval_embeddings(*, source, golden_source, model=None, k=10)

Input schema

column	type	notes
`image_id`	utf8	per-row key
`image`	binary	raw PNG/JPEG/TIFF bytes (decoded by the encoder)

Preprocessing (pad-to-square, no center crop, normalization, L2-normalized output) is handled inside the encoder per the model’s preprocess_cfg.

Golden source shape (image mode)

eval_embeddings switches to image-query mode when the golden source carries a query_image (binary) column instead of query_text:

column	type	example
`query_id`	utf8	`q_circle`
`query_image`	binary	raw PNG bytes of the query image
`relevant_id`	utf8	`img_circle_0` (matches `image_id`)

Fixtures

cookbook/fixtures/tiny_image_corpus/ — 20 synthetic 224×224 PNGs in 5 shape families (circle / triangle / square / hexagon / grating), 4 per family, plus a held-out query image per family under queries/. Rendered programmatically by cookbook/fixtures/generate.py — no real patent imagery (licensing).
cookbook/fixtures/tiny_image_golden.json — per-query → expected corpus IDs (same shape family).
cookbook/fixtures/tiny_open_clip/ — tiny offline OpenCLIP fixture used as the default CI model.

Run it

python cookbook/recipes/image_search/example.py

Exits 0 on success, prints the top-K and the metrics dict + image_search: OK.

Evaluate inference (classification)

Run a classifier over a registered source and score its predictions against gold labels.

When to use this pattern. You have a labelled holdout set and you want a single number — accuracy, macro F1, per-class F1 — to compare two classifiers, or to track drift over time on the same classifier.

What `example.py` does

Connects to a temporary artifact dir
Registers the tiny corpus as corpus (parquet)
Registers tiny_labels.csv as golden (csv) — (id, label) rows
Runs db.eval_inference with the local tiny_modernbert_classifier fixture against the content column
Prints the returned aggregate accuracy, macro f1, per-class metrics, and the count of per-record predictions
Asserts every reported rate is in [0.0, 1.0]

API surface exercised

Database.eval_inference(*, model, source, columns, task, golden_source, label_column)

The returned dict carries aggregate (tagged by "task" — currently "classification") with accuracy, f1, and per_class, plus per_record (one entry per aligned {record_id, predicted, gold}).

The task argument is the string form of the inference task — "classification" here. For NER, see ../eval_inference_ner/.

Golden source shape

eval_inference requires a registered source with these columns:

column	type	example
`id`	utf8	`"1"`
`<label_column>`	utf8	`physics`

label_column is the kwarg you pass at call time — label in this recipe. Every id in the golden source must resolve to a row in the input source; rows without a gold label are silently dropped from the metric.

Run it

python cookbook/recipes/eval_inference/example.py

Exits 0 on success, prints the metrics dict + eval_inference: OK.

Evaluate inference (NER)

Run a token-classification model over a registered source and score its predicted entity spans against gold spans.

When to use this pattern. You have a labelled NER holdout set (one gold span per row) and you want strict entity-level precision, recall, and F1 — both overall and per entity type — to compare two NER models or to track regressions on the same one.

What `example.py` does

Connects to a temporary artifact dir
Registers tiny_ner_corpus.parquet as corpus (parquet)
Registers tiny_ner_gold.csv as golden (csv) — one row per gold entity span: (id, label, start, end)
Runs db.eval_inference with the local tiny_modernbert_ner fixture against the text column, task="ner"
Prints the returned aggregate precision, recall, f1, the per-type breakdown, and the count of per-record predictions
Asserts every reported rate is in [0.0, 1.0]

API surface exercised

Database.eval_inference(*, model, source, columns, task, golden_source, label_column)

The returned dict carries aggregate (tagged by "task" — "ner" for this recipe) with precision, recall, f1, and per_type (one breakdown per entity type the model emitted or the gold set carried), plus per_record (one entry per aligned {record_id, predicted, gold} where predicted and gold are entity-span lists, each tagged "task": "ner").

The task argument is the string form of the inference task — "ner" here. For classification, see ../eval_inference/.

Golden source shape

eval_inference with task="ner" requires a registered source with these columns — one row per entity span (multiple spans on the same id accumulate into one per-row gold set):

column	type	example
`id`	utf8	`"1"`
`<label_column>`	utf8	`PER`
`start`	i64	`0`
`end`	i64	`13`

label_column is the kwarg you pass at call time — label in this recipe. start is inclusive, end is exclusive, both byte offsets into the source row’s text column. The label set must match the shipped model’s id2label minus the B-/I- prefixes — tiny_modernbert_ner knows PER and ORG only.

Rows in the source without a matching gold id are silently dropped from the metric (same alignment rule the classification recipe uses).

Run it

python cookbook/recipes/eval_inference_ner/example.py

Exits 0 on success, prints the metrics dict + eval_inference (ner): OK.

Fine-tune an encoder

Run a LoRA fine-tune on top of an existing text encoder, poll the job to completion, and use the resulting checkpoint to encode a query.

When to use this pattern. Your domain (legal contracts, medical abstracts, patent claims, internal product docs) doesn’t match the distribution the base encoder was trained on, and you have a few hundred to a few thousand labelled or contrastive pairs. LoRA gets you ~80% of the lift of a full fine-tune at a fraction of the cost; the resulting adapter is small enough to ship as an attachment to the base model rather than a re-distributed full checkpoint.

What `example.py` does

Connects to a temporary artifact dir
Registers tiny_pairs.csv (30 contrastive pairs) as training
Calls db.fine_tune(...) with the local tiny_bert base, a small LoRA rank, and one epoch — kept fast for CI
Waits for terminal status via job.wait()
Asserts the resulting model_id starts with jammi:fine-tuned:
Encodes a query through the fine-tuned model to confirm it loads

API surface exercised

Database.fine_tune(*, source, base_model, columns, method, task=..., ...)
TrainingJob.wait()
TrainingJob.job_id, TrainingJob.model_id
Database.encode_query(*, model, query, modality="text")

The full keyword list on fine_tune covers LoRA rank/alpha/dropout, learning rate, epochs, batch size, max sequence length, validation fraction, early-stopping patience/metric, warmup, gradient accumulation, backbone dtype, weight decay, and gradient clipping — the recipe uses the defaults for everything except rank and epochs.

Performance note

This recipe is excluded from the per-PR smoke matrix because even at one epoch it runs ~30 seconds on CPU. The nightly cron with JAMMI_COOKBOOK_SLOW=1 includes it. Override the gate locally:

JAMMI_COOKBOOK_SLOW=1 python tests/cookbook_smoke.py

Run it

python cookbook/recipes/fine_tune/example.py

Exits 0 on success, prints job_id, model_id, and fine_tune: OK.

Connect via Flight SQL

Run a query against a remote jammi-server over Arrow Flight SQL.

When to use this pattern. You’re connecting from a non-Python client (Tableau, dbt, JDBC tools, Rust binaries), or you want to expose Jammi to multiple readers without each one holding an embedded session. The same protocol is what dbt-flightsql, the official Flight SQL JDBC driver, and BI tools speak natively.

What `example.py` does

Spawns target/release/jammi-server as a child process pointed at a temp artifact_dir
Polls the health endpoint (http://127.0.0.1:8080/healthz) until the server is ready (5 s budget)
Opens a pyarrow.flight.FlightClient against grpc://127.0.0.1:8081
Submits SELECT 1 AS one over Flight SQL and confirms the response
Tears down the server process cleanly

This recipe is gated out of the per-PR CI matrix — it depends on the jammi-server binary being built (cargo build --release -p jammi-server), and the build cost dominates the test wall-clock. The nightly cookbook job builds the binary and runs the recipe behind JAMMI_COOKBOOK_SLOW=1.

Prerequisites

cargo build --release -p jammi-server — produces target/release/jammi-server
pip install pyarrow (already a jammi-ai dependency)

The script auto-detects JAMMI_BIN (env var) or falls back to the workspace’s target/release/jammi-server.

API surface exercised

pyarrow.flight.FlightClient.execute(query) over the Flight SQL command dialect
jammi-server — the OSS deployment-shape binary entrypoint

Run it

cargo build --release -p jammi-server      # one-time build
python cookbook/recipes/flight_sql/example.py

Exits 0 on success, prints the query result + flight_sql: OK.

Audio search

Run audio-to-audio similarity search over a corpus with a CLAP-format audio model, measure retrieval quality, and domain-tune the audio embeddings on caller-supplied triplets.

When to use this pattern. You have a corpus of sounds (clips, stems, loops, recordings) and want to find the ones most similar to a query clip — and a number that tells you how good the retrieval is. This is the audio counterpart of the image eval_embeddings recipe; audio is simply the third embedding modality the engine supports alongside text and images.

Flow

Load a small audio corpus (inline audio bytes in a Parquet source)
Generate L2-normalized audio embeddings over the audio column
Search the index with an encoded audio query (cosine ANN)
Eval retrieval quality (Recall@K / MRR) against a held-out golden set
Fine-tune a projection head on audio triplets and re-eval (tuned ≠ base)

Model

Any HuggingFace CLAP audio model works — its config.json declares model_type = "clap_audio_model" (or lists ClapModel / ClapAudioModelWithProjection in architectures), its checkpoint exposes the audio_model.audio_encoder.* + audio_projection.* HTSAT-Swin tower keys, and a preprocessor_config.json carries the feature-extractor geometry. The encoder is auto-detected from that config, exactly as the image recipe auto-detects OpenCLIP:

JAMMI_AUDIO_MODEL=<hf-repo-id-or-local-path> \
    python cookbook/recipes/audio_search/example.py

By default (no env var) the recipe runs against the hermetic cookbook/fixtures/htsat_clap_tiny fixture so it runs offline in CI in under a few seconds. That fixture has random weights, so its retrieval numbers are meaningless — it exercises the full pipeline, not model quality. Point JAMMI_AUDIO_MODEL at a real CLAP checkpoint for real numbers.

What `example.py` does

Connects to a temporary artifact dir
Reads the 20 committed mono WAV clips under cookbook/fixtures/tiny_audio_corpus/ into a Parquet corpus source (clip_id, audio bytes)
db.generate_embeddings(source="corpus", model=MODEL, columns=["audio"], key="clip_id", modality="audio")
db.encode_query(model=MODEL, query=wav_bytes, modality="audio") → db.search("corpus", query=vec, k=5) (returns a pyarrow.Table)
Builds the audio-query golden source from tiny_audio_golden.json and calls db.eval_embeddings(source="corpus", golden_source="golden.public.golden", k=5)
Prints the base aggregate Recall@K / precision@K / MRR / nDCG and the per-query records. It reports the metrics; it does not assert a quality bar.
Builds synthetic (anchor, positive, negative) audio triplets from the corpus (positive = same timbre family, negative = a different family) and calls db.fine_tune(source="triplets", base_model=MODEL, columns=["anchor","positive","negative"], method="lora", task="audio_embedding", ...). Empty target_modules ⇒ a trainable projection head on the frozen CLAP audio tower (the cheap, low-risk lightweight mode). It then re-embeds the corpus with the tuned model, re-evals, and prints base-vs-tuned metrics for narrative. For correctness it re-encodes the same query clip through the tuned model and asserts the embedding vector changed (max elementwise |Δ| > 1e-4 versus the base encoding) — the real invariant fine-tuning guarantees, and a deterministic check. (Asserting on the coarse top-k metrics instead is flaky: on this tiny eval set the rankings rarely flip even when the vectors move.) It proves the adapter alters audio retrieval — not that it improves it; the random-weight fixture’s direction is not meaningful, real lift comes from a real checkpoint.

The pairing semantics (what a “positive” means) are the caller’s training data, not the trainer’s: the trainer only minimizes the contrastive triplet loss over whatever clips you pair.

Stepwise scripts

example.py runs every phase in one process (this is the version wired into tests/cookbook_smoke.py). The numbered scripts decompose the search-and-eval flow and share a persistent workdir, so run them in order:

python cookbook/recipes/audio_search/01-load-corpus.py
python cookbook/recipes/audio_search/02-generate-embeddings.py
python cookbook/recipes/audio_search/03-search.py
python cookbook/recipes/audio_search/04-eval.py

API surface exercised

Database.generate_embeddings(*, source, model, columns, key, modality="audio")
Database.encode_query(*, model, query, modality="audio") → list[float]
Database.search(source, *, query, k, filter=None, select=None) → pyarrow.Table
Database.eval_embeddings(*, source, golden_source, model=None, k=10)
Database.fine_tune(*, source, base_model, columns, method, task="audio_embedding", ...) → TrainingJob

Audio triplet schema (fine-tune input)

column	type	notes
`anchor`	binary	encoded audio clip
`positive`	binary	a clip the caller deems related
`negative`	binary	a clip the caller deems unrelated

Same column shape as text triplets — task="audio_embedding" is what tells the loader to read the three columns as encoded audio rather than text.

Input schema

column	type	notes
`clip_id`	utf8	per-row key
`audio`	binary	raw WAV/FLAC/MP3/Ogg bytes (decoded by the encoder)

Preprocessing (decode → resample to the model’s sample rate → CLAP fusion log-mel spectrogram → HTSAT-Swin tower → L2-normalized output) is handled inside the encoder per the model’s preprocessor_config.json feature-extractor geometry. The audio column may also hold file-path strings instead of inline bytes.

Golden source shape (audio mode)

eval_embeddings switches to audio-query mode when the golden source carries a query_audio (binary) column instead of query_text / query_image:

column	type	example
`query_id`	utf8	`q_sine`
`query_audio`	binary	raw WAV bytes of the query clip
`relevant_id`	utf8	`clip_sine_0` (matches `clip_id`)

Fixtures

cookbook/fixtures/tiny_audio_corpus/ — 20 synthetic mono WAV clips in 5 timbre families (sine / harmonic / square / saw / noise), 4 per family, plus a held-out query clip per family under queries/. Synthesised programmatically by cookbook/fixtures/generate.py — no recorded audio (licensing), no tenant data.
cookbook/fixtures/tiny_audio_golden.json — per-query → expected corpus IDs (same timbre family).
cookbook/fixtures/htsat_clap_tiny/ — tiny offline HTSAT-Swin CLAP fixture used as the default CI model, generated by tests/fixtures/generate_htsat_clap.py.

Run it

python cookbook/recipes/audio_search/example.py

Exits 0 on success, prints the top-K and the metrics dict + audio_search: OK.

Per-query search audit

Record a tamper-evident audit row for every search: what was queried, with what model, what came back, and when. The substrate signs each record, stores it tenant-scoped, and publishes it to a trigger topic — so you do not hand-roll an audit schema, a signature scheme, and a stream integration in every project.

This is the primitive every audited-ML deployment (financial, healthcare, federal, legal) needs to answer “show me exactly what this model returned for this query, and prove the record hasn’t been altered.”

What this recipe shows

Build a PerQueryAudit record (query id, model id/version, query lineage, top-K result ids, retrieval scores).
db.audit.log([...]) — the substrate injects tenant_id, signs the record with a per-tenant HMAC-SHA256 key, stores it, and publishes it.
db.audit.fetch_by_query_id(...) / db.audit.fetch_recent(...) — typed reads, tenant-scoped.
record.verify() — re-derive the key and check the signature.
Plain SQL over mutable.public."_jammi_search_audit" — same tenant scope.
db.subscribe_collect("jammi.audit.search.v1", ...) — every logged record is also delivered on a trigger topic for alerting / analytics / warehouse sinks.

Run it

The audit master key is required — the substrate refuses to sign without it:

export JAMMI_AUDIT_MASTER_KEY=$(python -c "import secrets; print(secrets.token_hex(32))")
python cookbook/recipes/search_audit/example.py

The key derives a distinct signing secret per tenant via HKDF-SHA256 and is deterministic across restarts, so signatures written today verify after a redeploy. Source it from your secret manager — never hard-code it.

Key points

Lineage is capped. query_lineage JSON may not exceed 8 KiB (override with JAMMI_AUDIT_MAX_LINEAGE_BYTES). Store image hashes and row IDs, not raw payloads — compliance posture is structural, not advisory.
top_k_result_ids and retrieval_scores must be the same length. This is checked when you construct the record.
The table is reserved. _jammi_search_audit is created implicitly on the first log; you cannot create or directly INSERT into it (that would bypass signing). Read it freely via SQL.
Tenant isolation is automatic. A record logged under tenant A is invisible to tenant B, through both the typed API and raw SQL.

Ephemeral session storage

A session-scoped storage context whose tables are auto-deleted when the session ends — on explicit close(), on context-manager exit, or when the 60-second timeout scanner force-closes a session past its deadline. Every transition publishes to the jammi.audit.session_lifecycle.v1 trigger topic, giving an audit-log aggregator durable proof that the data was deleted.

Run it:

python cookbook/recipes/session_lifecycle/example.py

When to use it

Use an ephemeral session for sensitive transient data that must not outlive the request that produced it: uploaded images, derived embeddings, draft model inputs. The session is always tenant-scoped — tenant A can never see tenant B’s ephemeral tables.

When NOT to use it

Do not store long-lived data in an ephemeral session. The audit record, the persistent corpus, and anything compliance needs to read later belong in ordinary mutable tables. The pattern is: keep the throwaway working set (raw bytes, embeddings) in the ephemeral session, and write only durable lineage (hashes, ids, scores) to a persistent table — before you close the session, while the working data still exists.

API

with db.ephemeral_session(timeout_seconds=3600) as ephem:
    ephem.create_ephemeral_table("imgs", schema=schema, primary_key=["image_id"])
    ephem.insert("imgs", batch=table)
    rows = ephem.sql("imgs", "SELECT image_hash FROM {table}")
# close() runs on exit: tables dropped, `closed` event published

{table} in a sql query is replaced by the tenant-scoped reference to the named ephemeral table. The context manager is the recommended path; Drop is best-effort. Lifecycle events (opened, closed, timed_out, partial_deletion_failure) carry the session id, tenant, table count, and deleted-row count.

Keyboard shortcuts

Jammi AI Guide