Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Query Your Data with SQL

Register data files as named sources, then query them with full SQL. Sources are persisted in the catalog and survive session restarts.

Register a source

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use jammi_ai::session::InferenceSession;
async fn ex(session: &InferenceSession) -> jammi_db::error::Result<()> {
use jammi_db::source::{FileFormat, SourceConnection, SourceType};

session.add_source("patents", SourceType::File, SourceConnection {
    url: Some("file:///data/patents.parquet".into()),
    format: Some(FileFormat::Parquet),
    ..Default::default()
}).await?;
Ok(()) }
}

Python

db.add_source("patents", path="/data/patents.parquet", format="parquet")

CLI

jammi sources add patents --path /data/patents.parquet --format parquet

Supported formats

FormatRustPython/CLINotes
ParquetFileFormat::Parquet"parquet"Columnar, compressed, recommended for large datasets
CSVFileFormat::Csv"csv"Auto-detected schema
JSONFileFormat::Json"json"Line-delimited JSON

Run a SQL query

Sources are accessible via three-part SQL names: <source_id>.public.<table_name>. The table name is derived from the file name (e.g., patents.parquet becomes patents).

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use jammi_ai::session::InferenceSession;
async fn ex(session: &InferenceSession) -> jammi_db::error::Result<()> {
let results = session.sql(
    "SELECT id, title, year FROM patents.public.patents WHERE year > 2020 ORDER BY year"
).await?;

for batch in &results {
    println!("{batch:?}");
}
Ok(()) }
}

Python

table = db.sql("SELECT id, title, year FROM patents.public.patents WHERE year > 2020 ORDER BY year")
print(table.to_pandas())

CLI

jammi query "SELECT id, title, year FROM patents.public.patents WHERE year > 2020 ORDER BY year"

Aggregations

SELECT category, COUNT(*) as count, AVG(citation_count) as avg_citations
FROM patents.public.patents
WHERE year > 2020
GROUP BY category
ORDER BY count DESC

Joins across sources

Register multiple sources and join them in a single query:

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use jammi_ai::session::InferenceSession;
use jammi_db::source::{FileFormat, SourceConnection, SourceType};
async fn ex(session: &InferenceSession) -> jammi_db::error::Result<()> {
session.add_source("companies", SourceType::File, SourceConnection {
    url: Some("file:///data/companies.csv".into()),
    format: Some(FileFormat::Csv),
    ..Default::default()
}).await?;

let results = session.sql("
    SELECT p.title, c.company_name
    FROM patents.public.patents p
    JOIN companies.public.companies c ON p.assignee_id = c.id
").await?;
Ok(()) }
}

Python

db.add_source("companies", path="/data/companies.csv", format="csv")

table = db.sql("""
    SELECT p.title, c.company_name
    FROM patents.public.patents p
    JOIN companies.public.companies c ON p.assignee_id = c.id
""")

Source lifecycle

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use jammi_ai::session::InferenceSession;
async fn ex(session: &InferenceSession) -> jammi_db::error::Result<()> {
// List registered sources
let sources = session.catalog().list_sources().await?;

// Remove a source
session.remove_source("patents").await?;
Ok(()) }
}

CLI

jammi sources list

Sources persist in the SQLite catalog at <artifact_dir>/catalog.db. Registering the same source ID twice returns an error — remove it first.

Execution plans

Use EXPLAIN (or the CLI explain command) to see how DataFusion will execute your query:

jammi explain "SELECT * FROM patents.public.patents WHERE year > 2020"