Measured on Infercom EU Infrastructure
These numbers are from our production API with our servers in the EU. Not vendor marketing — real measurements you can reproduce.
Last measured: March 2026
MiniMax-M2.5
1K input / 1K output, single request
Variance <1.7%gpt-oss-120b
1K input / 1K output, single request
Variance <0,1%DeepSeek-V3.1
1K input / 1K output, single request
Variance <0,1%Server-side metrics (p50). Measured using our open-source benchmark tool. Your client-side results will vary based on network location and conditions.
What We Measure
Three metrics that matter for production AI inference
Time to First Token (TTFT)
How quickly the model starts responding after your request. Critical for interactive applications and chat interfaces.
Output Throughput
Tokens generated per second after the first token. Determines how fast a complete response is delivered to the user.
End-to-End Latency
Total time from request to complete response. Includes TTFT plus full generation time. The number that matters for batch workloads.
Run Your Own Benchmark
Our benchmark tool is fully open source. Run it against our API with your free-tier API key, or clone the repository and run it locally. Same code, same methodology, your results.
Synthetic Performance
Fixed input/output token counts for controlled comparisons across models
Real Workload Simulation
Variable request rates mimicking production traffic patterns
Custom Dataset
Upload your own prompts and measure performance on your actual workload
Interactive Chat
Per-response metrics in a live chat interface — see TTFT and throughput on every reply
Independent Verification
Infercom runs on SambaNova's dataflow architecture — the same technology independently benchmarked by analysts, researchers, and tech journalists.
Artificial Analysis
Artificial Analysis — SambaNova Benchmarks
Independent, continuously updated speed, latency, and quality benchmarks across all major inference providers. The industry standard.
Read moreVentureBeat
SambaNova Breaks 1,000 Tokens/Sec Barrier
How SambaNova's dataflow architecture achieved world-record throughput — the same technology that powers Infercom.
Read moreTechRadar
DeepSeek R1 671B with 95% Fewer Chips
Running the world's largest reasoning model on just 16 SambaNova chips vs. 320 GPUs — with faster results.
Read moreSambaNova
Speed Record on Llama 3.1 405B
Independently verified by Artificial Analysis: 4x faster than the next closest provider on the largest Llama model.
Read moreSambaNova
Intelligence per Joule
Why tokens per second isn't the full story — energy efficiency per unit of intelligence is the metric that matters at scale.
Read moreStanford Hazy Research
Intelligence Per Watt Research
Academic methodology for measuring AI efficiency that independently validates SambaNova's energy claims.
Read morePerformance Without the Power Bill
Up to 5x more energy efficient than GPU-based inference. Speed doesn't have to come at the planet's expense.
10 kW
Per Rack
vs. 40–50 kW+ for equivalent GPU infrastructure. Dramatic reduction in power consumption and cooling requirements.
Air Cooled
No Liquid Cooling
Standard air cooling simplifies deployment, eliminates water usage, and reduces operational complexity and cost.
Up to 5x
More Efficient
More intelligence per joule of energy consumed. Validated by Stanford Hazy Research methodology.
Optimize Your Integration
Get the best performance from your Infercom integration with our developer resources.