Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Classify Text

Run a classification model over text columns to assign labels and confidence scores. Any HuggingFace model with id2label in its config works out of the box.

Basic usage

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use jammi_ai::session::InferenceSession;
async fn ex(session: &InferenceSession) -> jammi_db::error::Result<()> {
use jammi_ai::model::{ModelSource, ModelTask};

let model = ModelSource::hf("answerdotai/ModernBERT-base-classification");
let results = session.infer(
    "patents",
    &model,
    ModelTask::Classification,
    &["abstract".to_string()],
    "id",
).await?;
Ok(()) }
}

Python

results = db.infer(
    source="patents",
    model="answerdotai/ModernBERT-base-classification",
    columns=["abstract"],
    task="classification",
    key="id",
)

Output schema

Each RecordBatch has prefix columns plus classification-specific columns:

ColumnTypeDescription
_row_idUtf8Key column value
_sourceUtf8Source identifier
_modelUtf8Model identifier
_statusUtf8"ok" or "error"
_errorUtf8 (nullable)Error message if failed
_latency_msFloat32Inference latency
labelUtf8 (nullable)Predicted class label
confidenceFloat32 (nullable)Confidence score (0-1)
all_scores_jsonUtf8 (nullable)JSON with all class scores

Supported model architectures

Classification models must have id2label in their config.json. Supported architectures:

BERT family — BERT, RoBERTa, DistilBERT, CamemBERT, XLM-RoBERTa:

  • Loads classifier.weight + classifier.bias from safetensors
  • CLS token pooling + linear classifier + softmax

ModernBERT — uses the built-in ModernBertForSequenceClassification:

  • CLS or MEAN pooling (configured via classifier_pooling in config)
  • Head (dense + GELU + LayerNorm) + classifier + softmax

Fine-tuning for classification

Train a LoRA adapter with a classification head on your labeled data:

Prepare training data

text,label
"quantum error correction","physics"
"CRISPR gene editing","biology"

Rust

#![allow(unused)]
fn main() {
extern crate jammi_db;
extern crate jammi_ai;
extern crate tokio;
use jammi_ai::session::InferenceSession;
async fn ex(session: &InferenceSession) -> jammi_db::error::Result<()> {
use jammi_ai::fine_tune::FineTuneMethod;
use jammi_db::ModelTask;

let job = session.fine_tune(
    "training",
    "sentence-transformers/all-MiniLM-L6-v2",
    &["text".into(), "label".into()],
    FineTuneMethod::Lora,
    ModelTask::Classification,
    None,
).await?;

job.wait().await?;
Ok(()) }
}

Python

job = db.fine_tune(
    source="training",
    base_model="sentence-transformers/all-MiniLM-L6-v2",
    columns=["text", "label"],
    method="lora",
    task="classification",
)
job.wait()

The fine-tuned model trains a LoRA projection plus a linear classification head using cross-entropy loss. Both are saved to adapter.safetensors.

Error handling

Same per-row error tracking as embeddings:

Condition_statuslabelconfidence
Valid text"ok"Predicted label0-1 score
Null/empty text"error"nullnull