Performance

Europe's Fastest LLM Inference

Don't take our word for it — verify it yourself.

Real benchmark results from our EU infrastructure. Independently verified technology. Open-source benchmarks you can run yourself.

Measured on Infercom EU Infrastructure

These numbers are from our production API with our servers in the EU. Not vendor marketing — real measurements you can reproduce.

Last measured: April 2026

EU SovereignFastest Throughput

gpt-oss-120b

Output Throughput
713tok/s
Time to First Token
388ms
End-to-End Latency
1.789s

10K input / 1K output, single request

Up to 772 tok/s on shorter prompts

EU SovereignBest Reasoning

MiniMax-M2.5

Output Throughput
402tok/s
Time to First Token
619ms
End-to-End Latency
3.103s

10K input / 1K output, single request

Up to 426 tok/s on shorter prompts

Server-side metrics (p50). Measured using our open-source benchmark tool. Your client-side results will vary based on network location and conditions.

EU-Hosted Inference Infrastructure

All benchmarks measured on Infercom's production infrastructure in Germany. Your data never leaves European jurisdiction.

Location

Munich, Germany

Hardware

SambaNova SN40L Dataflow Architecture

Certification

ISO 27001 Certified

Ownership

Infercom-Owned Hardware

Your requests terminate on infrastructure we own — not rented cloud capacity from hyperscalers.

Data Residency

100% EU — No CLOUD Act Exposure

Data never leaves European jurisdiction. No US CLOUD Act exposure, no third-country transfers.

What We Measure

Three metrics that matter for production AI inference

Time to First Token (TTFT)

How quickly the model starts responding after your request. Critical for interactive applications and chat interfaces.

Output Throughput

Tokens generated per second after the first token. Determines how fast a complete response is delivered to the user.

End-to-End Latency

Total time from request to complete response. Includes TTFT plus full generation time. The number that matters for batch workloads.

Open Source

Run Your Own Benchmark

Our benchmark tool is fully open source. Run it against our API with your free-tier API key, or clone the repository and run it locally. Same code, same methodology, your results.

Synthetic Performance

Fixed input/output token counts for controlled comparisons across models

Real Workload Simulation

Variable request rates mimicking production traffic patterns

Custom Dataset

Upload your own prompts and measure performance on your actual workload

Interactive Chat

Per-response metrics in a live chat interface — see TTFT and throughput on every reply

Performance Without the Power Bill

Up to 5x more energy efficient than GPU-based inference. Speed doesn't have to come at the planet's expense.

10 kW

Per Rack

vs. 40–50 kW+ for equivalent GPU infrastructure. Dramatic reduction in power consumption and cooling requirements.

Air Cooled

No Liquid Cooling

Standard air cooling simplifies deployment, eliminates water usage, and reduces operational complexity and cost.

Up to 5x

More Efficient

More intelligence per joule of energy consumed. Validated by Stanford Hazy Research methodology.

Ready to Build the Future of AI in Europe?

Join forward-thinking organizations deploying sovereign AI with world-class performance