Enrich Results with Joins and Annotations
Search results can be enriched by joining with other data sources and annotating with additional model inference. Every enrichment step is tracked in the evidence provenance columns.
Join with another source
Join search results with a registered source to add context columns (e.g., company name, category labels):
Rust
#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use std::sync::Arc;
use jammi_ai::session::InferenceSession;
use jammi_db::source::{FileFormat, SourceConnection, SourceType};
async fn ex(session: &Arc<InferenceSession>, query: Vec<f32>) -> jammi_db::error::Result<()> {
session.add_source("assignees", SourceType::File, SourceConnection {
url: Some("file:///data/assignees.csv".into()),
format: Some(FileFormat::Csv),
..Default::default()
}).await?;
let results = session.search("patents", query, 10).await?
.join("assignees", "assignee_id=id", None).await? // left join by default
.run().await?;
// Results now include company_name, country from assignees
Ok(()) }
}
Python
db.add_source("assignees", path="/data/assignees.csv", format="csv")
search = db.search("patents", query=query_vec, k=10)
search.join("assignees", on="assignee_id=id")
results = search.run()
# Results now include company_name, country from assignees
The on parameter is "left_col=right_col". The optional join type is "inner" or "left" (default).
Annotate with model inference
Run a model over search results to add new columns:
Rust
#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use std::sync::Arc;
use jammi_ai::session::InferenceSession;
use jammi_db::ModelTask;
async fn ex(session: &Arc<InferenceSession>, query: Vec<f32>) -> jammi_db::error::Result<()> {
let results = session.search("patents", query, 10).await?
.annotate(
"sentence-transformers/all-MiniLM-L6-v2",
ModelTask::TextEmbedding,
&["abstract".to_string()],
).await?
.run().await?;
Ok(()) }
}
Python
search = db.search("patents", query=query_vec, k=10)
search.annotate(
model="sentence-transformers/all-MiniLM-L6-v2",
task="text_embedding",
columns=["abstract"],
)
results = search.run()
Evidence provenance
Every search result carries provenance tracking that records how each row was found and enriched:
| Scenario | retrieved_by | annotated_by |
|---|---|---|
| Plain search | ["vector"] | [] |
| Search + annotate | ["vector"] | ["inference"] |
These are List<Utf8> columns — each row has its own list of contributing channels.
Composing everything
All operations compose freely:
Rust
#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use std::sync::Arc;
use jammi_ai::session::InferenceSession;
use jammi_db::ModelTask;
async fn ex(session: &Arc<InferenceSession>, query: Vec<f32>) -> jammi_db::error::Result<()> {
let results = session.search("patents", query, 100).await?
.join("assignees", "assignee_id=id", None).await?
.annotate("all-MiniLM-L6-v2", ModelTask::TextEmbedding, &["abstract".into()]).await?
.filter("country = 'US'")?
.sort("similarity", true)?
.limit(10)
.select(&["title".into(), "company_name".into(), "similarity".into()])?
.run().await?;
Ok(()) }
}
Python
search = db.search("patents", query=query_vec, k=100)
search.join("assignees", on="assignee_id=id")
search.annotate(model="all-MiniLM-L6-v2", task="text_embedding", columns=["abstract"])
search.filter("country = 'US'")
search.sort("similarity", descending=True)
search.limit(10)
search.select(["title", "company_name", "similarity"])
results = search.run()
The pipeline builds a DataFusion execution plan under the hood. No data is processed until .run() — so adding more steps doesn’t cost anything until execution.