Developer Resources

EU Sovereign AI Inference Glossary

Clear, sourced definitions across EU sovereign AI inference - from data residency, GDPR, and zero data retention to TTFT, throughput, and dataflow architecture. Every entry is backed by published sources and real benchmark data from our EU infrastructure.

Sovereignty & Compliance

Data Residency

Where your data is physically stored and processed - a necessary part of sovereignty, but not the same as control over who can legally reach it.

Data Processing Agreement (DPA)

The GDPR Article 28 contract that governs how an inference provider may process the personal data in your prompts - and a basic test of whether a provider is enterprise-ready.

GDPR for AI Inference

What Europe's data-protection law requires when your prompts contain personal data - a lawful basis, a processor agreement, and processing that stays within reach of EU law.

Zero Data Retention (ZDR)

When an inference provider doesn't store your prompts or outputs after serving a request, and never trains on them - shrinking your data exposure to the moment of processing.

Performance Metrics

TTFT (Time to First Token)

How long a user waits between sending a request and seeing the first token of the response.

Inter-Token Latency (ITL)

The average time gap between consecutive tokens during generation - also called TPOT.

Tokens per Second

The standard unit for LLM generation speed - and why the same number can mean two different things.

Inference Speed

The umbrella term: TTFT, inter-token latency, and throughput - and which one matters when.

Architecture

RDU (Reconfigurable Dataflow Unit)

SambaNova's AI processor - purpose-built AI chips designed for dataflow execution instead of instruction-by-instruction processing.

Dataflow Architecture

The execution model where data streams through operations as a pipeline - eliminating the kernel-by-kernel round trips of GPU execution.

Models & Inference

Inference

Running a trained AI model to produce outputs - the production workload of AI, and the one whose cost and speed compound with usage.

Throughput (LLM Serving)

Tokens per second in two senses: per-request output throughput vs. system-wide capacity - and how batching trades one against the other.

Prefill vs. Decode

The two phases of LLM inference - parallel prompt processing vs. token-by-token generation.

Latency vs. Throughput

The fundamental serving trade-off: total system output vs. each user's speed.

Open-Weight Model

A model whose trained parameters are published so anyone can run it themselves - the technical basis for sovereign inference.

Context Window

The maximum amount of text, in tokens, a model can consider at once - prompt plus output. Its length directly shapes inference speed and cost.

Parameters

A model's learned weights - the rough measure of its size and capacity, and the direct driver of its memory, speed, and cost.

EU Sovereign AI Inference Glossary

Sovereignty & Compliance

Performance Metrics

Architecture

Models & Inference

Ready to Build the Future of AI in Europe?