Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Deploy as a Server

Jammi can run as an Arrow Flight SQL server, making all registered sources and embedding tables queryable from any Arrow-compatible client. Use this when multiple services, BI tools, or non-Rust/Python consumers need to query Jammi’s data.

The workflow

The server is a read path. Set up your data with the library or CLI, then deploy the server so other systems can query it:

# 1. Register sources and generate embeddings (CLI or library)
jammi sources add patents --path /data/patents.parquet --format parquet

# 2. Generate embeddings (library or Python — not available over Flight SQL)
python3 -c "
import jammi_ai
db = jammi_ai.connect()
db.generate_embeddings(source='patents', model='sentence-transformers/all-MiniLM-L6-v2', columns=['abstract'], key='id')
"

# 3. Start the server
jammi serve

Connecting with Arrow Flight SQL

Python (pyarrow)

from pyarrow.flight import FlightClient, FlightDescriptor

client = FlightClient("grpc://localhost:8081")

# Run a SQL query
info = client.get_flight_info(
    FlightDescriptor.for_command(b"SELECT id, title, year FROM patents.public.patents WHERE year > 2020")
)
reader = client.do_get(info.endpoints[0].ticket)
table = reader.read_all()
print(table.to_pandas())

Query embedding tables

Embedding tables are registered in DataFusion and queryable via SQL:

# List all embedding tables
info = client.get_flight_info(
    FlightDescriptor.for_command(b"SELECT table_name FROM information_schema.tables WHERE table_schema = 'jammi'")
)

# Query vectors directly
info = client.get_flight_info(
    FlightDescriptor.for_command(b"SELECT _row_id, _model_id FROM \"jammi.patents__embedding__all-MiniLM-L6-v2__20260325\" LIMIT 10")
)

JDBC

Flight SQL is compatible with JDBC drivers that support the Arrow Flight SQL protocol, enabling access from Java applications, BI tools (Superset, DBeaver, Tableau), and SQL editors.

Server configuration

[server]
flight_listen = "0.0.0.0:8081"
preload_models = ["sentence-transformers/all-MiniLM-L6-v2"]

[logging]
level = "info"
format = "json"    # structured logging for production

Preloading models

Models listed in preload_models are downloaded and loaded into memory at startup. This ensures the session is warm before the server accepts connections.

[server]
preload_models = [
    "sentence-transformers/all-MiniLM-L6-v2",
    "BAAI/bge-small-en-v1.5",
]

GPU configuration

For GPU-accelerated inference in production:

[gpu]
device = 0            # CUDA device index
memory_limit = "auto"
memory_fraction = 0.9

[inference]
batch_size = 64
max_loaded_models = 3

Set gpu.device = -1 for CPU-only deployment.

Environment variable overrides

Every config field can be overridden with environment variables, useful for containerized deployments:

JAMMI_SERVER__FLIGHT_LISTEN=0.0.0.0:9081 \
JAMMI_GPU__DEVICE=-1 \
JAMMI_LOGGING__FORMAT=json \
jammi serve

Health, readiness, and metrics

The server exposes three HTTP side-channel endpoints on port 8080:

curl http://localhost:8080/healthz
# {"status":"ok","version":"0.8.0"}

curl http://localhost:8080/readyz
# {"status":"ready"}

curl http://localhost:8080/metrics
# jammi_grpc_requests_total 0
# jammi_flight_queries_total 0
# jammi_eval_invocations_total 0
# jammi_search_latency_seconds_bucket{...} 0

/healthz is a liveness probe — a 200 means the process is running. /readyz is a readiness probe — 200 means the catalog backend responded; 503 means it didn’t and traffic should be drained from this instance. Point your load balancer at /readyz.

/metrics exposes a small, substrate-level set of Prometheus counters (gRPC requests, Flight SQL queries, eval invocations) plus a search- latency histogram. Wider observability lives in the commercial server.

What the server can and cannot do

OperationAvailable over Flight SQL?
SQL queries on source tablesYes
SQL queries on embedding tablesYes
Joins, aggregations, filtersYes
Generate embeddingsNo — use library or CLI
Semantic vector searchNo — use library or CLI
Fine-tuningNo — use library or CLI
EvaluationNo — use library or CLI

The server is a query interface. ML operations (embeddings, search, fine-tuning) are done through the Rust library, Python package, or CLI — then the results are queryable over Flight SQL.

Graceful shutdown

The server drains active connections on SIGTERM / Ctrl+C before exiting. In-flight queries complete; long-running operations started via the library are unaffected.

Deploying as a container

The OSS server ships as a public Docker image at ghcr.io/f-inverse/jammi-ai-server. The image is built from a distroless base, runs as the nonroot user (uid 65532), and exposes the same 8080 / 8081 ports the local binary listens on.

docker run --rm \
  -p 8080:8080 -p 8081:8081 \
  -v jammi_data:/var/lib/jammi \
  -v $(pwd)/jammi.toml:/etc/jammi/jammi.toml:ro \
  ghcr.io/f-inverse/jammi-ai-server:latest

A minimal compose file lives in the workspace at examples/docker-compose/oss-server.yml:

cd examples/docker-compose
docker compose -f oss-server.yml up

Persistence

/var/lib/jammi holds the catalog DB, model weights, and indices. The Dockerfile declares it as a VOLUME — bind mounts work, but the host directory must be writable by uid 65532:

# Bind mount on the host.
sudo chown -R 65532:65532 /opt/jammi/data
docker run -v /opt/jammi/data:/var/lib/jammi ...

A named Docker volume (the compose default) sidesteps that step because Docker provisions ownership for the container’s user automatically.

Configuration

The container’s entrypoint expects /etc/jammi/jammi.toml. Bind-mount your config there:

# oss-server.yml
services:
  jammi-server:
    image: ghcr.io/f-inverse/jammi-ai-server:latest
    volumes:
      - ./jammi.toml:/etc/jammi/jammi.toml:ro
      - jammi_data:/var/lib/jammi
    ports:
      - "8080:8080"
      - "8081:8081"

Building from source

The Dockerfile lives at the workspace root and uses BuildKit cache mounts for the cargo registry and target directory:

DOCKER_BUILDKIT=1 docker build -t jammi-ai-server:dev -f Dockerfile .

Cold builds take ~30 minutes (the workspace is large); warm builds with cache hits land at ~3 minutes.