Performance

Europe's Fastest LLM Inference

Don't take our word for it - verify it yourself.

Real benchmark results from our EU infrastructure. Independently verified technology. Open-source benchmarks you can run yourself.

Measured on Infercom EU Infrastructure

These numbers are from our production API with our servers in the EU. Not vendor marketing - real measurements you can reproduce.

Last measured: May 2026

EU SovereignFastest Throughput

gpt-oss-120b

Output Throughput

713tok/s

Time to First Token

388ms

End-to-End Latency

1.789s

10K input / 1K output, single request

Up to 772 tok/s on shorter prompts

EU Sovereign229B Frontier

MiniMax M2.7 Ultraspeed

Output Throughput

428tok/s

Time to First Token

690ms

End-to-End Latency

3.023s

10K input / 1K output, single request

Up to 444 tok/s on shorter prompts

EU SovereignBest Reasoning

MiniMax-M2.5

Output Throughput

402tok/s

Time to First Token

619ms

End-to-End Latency

3.103s

10K input / 1K output, single request

Up to 426 tok/s on shorter prompts

Server-side metrics (p50). Measured using our open-source benchmark tool. Your client-side results will vary based on network location and conditions.

EU-Hosted Inference Infrastructure

All benchmarks measured on Infercom's production infrastructure in Germany. Your data never leaves European jurisdiction.

Location

Munich, Germany

Hardware

SambaNova SN40L Dataflow Architecture

Certification

ISO 27001 Certified

Ownership

Infercom-Owned Hardware

Your requests terminate on infrastructure we own - not rented cloud capacity from hyperscalers.

Data Residency

100% EU - No CLOUD Act Exposure

Data never leaves European jurisdiction. No US CLOUD Act exposure, no third-country transfers.

What We Measure

Three metrics that matter for production AI inference

Time to First Token (TTFT)

How quickly the model starts responding after your request. Critical for interactive applications and chat interfaces.

Output Throughput

Tokens generated per second after the first token. Determines how fast a complete response is delivered to the user.

End-to-End Latency

Total time from request to complete response. Includes TTFT plus full generation time. The number that matters for batch workloads.

Open Source

Run Your Own Benchmark

Our benchmark tool is fully open source. Run it against our API with your free-tier API key, or clone the repository and run it locally. Same code, same methodology, your results.

Why is it this fast? The architecture behind our speed →

Synthetic Performance

Fixed input/output token counts for controlled comparisons across models

Real Workload Simulation

Variable request rates mimicking production traffic patterns

Custom Dataset

Upload your own prompts and measure performance on your actual workload

Interactive Chat

Per-response metrics in a live chat interface - see TTFT and throughput on every reply

Independent Verification

Infercom runs on SambaNova's dataflow architecture - the same technology independently benchmarked by analysts, researchers, and tech journalists.

Artificial Analysis

Artificial Analysis - SambaNova Benchmarks

Independent, continuously updated speed, latency, and quality benchmarks across all major inference providers. The industry standard.

VentureBeat

SambaNova Breaks 1,000 Tokens/Sec Barrier

How SambaNova's dataflow architecture achieved world-record throughput - the same technology that powers Infercom.

TechRadar

DeepSeek R1 671B with 95% Fewer Chips

Running the world's largest reasoning model on just 16 SambaNova chips vs. 320 GPUs - with faster results.

SambaNova

Speed Record on Llama 3.1 405B

Independently verified by Artificial Analysis: 4x faster than the next closest provider on the largest Llama model.

SambaNova

Intelligence per Joule

Why tokens per second isn't the full story - energy efficiency per unit of intelligence is the metric that matters at scale.

Stanford Hazy Research

Intelligence Per Watt Research

Academic methodology for measuring AI efficiency that independently validates SambaNova's energy claims.

Performance Without the Power Bill

Up to 5x more energy efficient than GPU-based inference. Speed doesn't have to come at the planet's expense.

10 kW

Per Rack

vs. 40–50 kW+ for equivalent GPU infrastructure. Dramatic reduction in power consumption and cooling requirements.

Air Cooled

No Liquid Cooling

Standard air cooling simplifies deployment, eliminates water usage, and reduces operational complexity and cost.

Up to 5x

More Efficient

More intelligence per joule of energy consumed. Validated by Stanford Hazy Research methodology.

Optimize Your Integration

Get the best performance from your Infercom integration with our developer resources.

Europe's Fastest LLM Inference

Measured on Infercom EU Infrastructure

gpt-oss-120b

MiniMax M2.7 Ultraspeed

MiniMax-M2.5

EU-Hosted Inference Infrastructure

What We Measure

Time to First Token (TTFT)

Output Throughput

End-to-End Latency

Run Your Own Benchmark

Synthetic Performance

Real Workload Simulation

Custom Dataset

Interactive Chat

Independent Verification

Artificial Analysis - SambaNova Benchmarks

SambaNova Breaks 1,000 Tokens/Sec Barrier

DeepSeek R1 671B with 95% Fewer Chips

Speed Record on Llama 3.1 405B

Intelligence per Joule

Intelligence Per Watt Research

Performance Without the Power Bill

Optimize Your Integration

Performance Guide

Benchmark Tool

AI Starter Kit

Pricing

Ready to Build the Future of AI in Europe?