Revolutionary AI Infrastructure

Infercom is powered by SambaNova's dataflow architecture — purpose-built for AI inference, delivering unprecedented performance and efficiency.

Up to 10x

Faster Inference

Up to 5x

Energy Efficient

Up to 25TB

Memory per Rack

Dataflow vs. GPU Architecture

Why purpose-built dataflow beats general-purpose GPUs for AI inference

Dataflow Architecture

Purpose-built for AI

Purpose-built for AI workloads, creating custom processing pipelines for entire computation graphs while minimizing data movement.

✓ Entire model resident in memory
✓ Data flows through operations without intermediate writes
✓ Operator fusion: hundreds of operations in single kernel
✓ Software-defined hardware optimizes for each workload

Traditional GPU

General-purpose design

General-purpose design requiring kernel-by-kernel execution creates bottlenecks for AI inference workloads.

✗ Kernel-by-kernel execution creates overhead
✗ Excessive data movement between processor and memory
✗ Memory bandwidth bottleneck limits performance
✗ Underutilization of compute resources

Deep dive: How dataflow solves the AI inference crisis

SN40L Reconfigurable Dataflow Unit

Built on TSMC's 5nm process with 1,040 compute cores per chip, delivering 638 BF16 TFLOPS per chip — 10.2 PetaFLOPS per rack.

SambaNova SN40L RDU — dual-die CoWoS package on TSMC 5nm

638

TFLOPS/Chip (BF16)

1,040

PCUs/Chip

10.2

PFLOPS/Rack

TSMC 5nm

Process

Three-Tier Memory per Chip

520MB

SRAM

Ultra-fast on-chip cache

64GB

HBM

High-bandwidth co-packaged memory

Up to 1.5TB

DDR

Off-package DIMM storage

Rack Configuration (SN40L-16)

RDU Chips

Up to 25TB

Total Memory

~10kW

Typical Power

Air

Cooled

World-Record Performance

Performance measured in output tokens per second. MiniMax and DeepSeek-V3.1 benchmarks from Infercom EU infrastructure. DeepSeek-R1 and gpt-oss from Artificial Analysis (SambaNova Cloud).

MiniMax M2.5

NEW — EU Hosted

404tokens/sec

High-performance multimodal model now hosted on Infercom's EU infrastructure. Independently measured at 400+ tokens/sec by Artificial Analysis.

EU Sovereign

DeepSeek-R1 671B

10x vs GPU

250tokens/sec

The world's largest reasoning model at unprecedented speed. Up to 10x faster than GPU-based providers.

671B params

DeepSeek-V3.1

EU Hosted

273tokens/sec

128K context, full function calling, and JSON mode — one of Europe's most capable sovereign LLMs.

EU Sovereign

gpt-oss-120b

Fastest EU Model

772tokens/sec

OpenAI's open-source 120B parameter model. Exceptional throughput for high-volume sovereign workloads.

EU Sovereign

View independent benchmarks on Artificial Analysis See our own measured results from EU infrastructure

Sustainable AI Infrastructure

Up to 5x better energy efficiency than GPU-based inference

Lower Power Consumption

Typical 10kW per rack versus GPU racks consuming 40–50kW+ for equivalent workloads. Dataflow architecture requires fewer chips, translating to dramatic power savings.

Smaller Footprint

Dramatically reduced physical space, simplified cooling, and lower total infrastructure costs.

Air-Cooled Design

No liquid cooling infrastructure required. Standard air cooling simplifies deployment, reduces maintenance complexity, and lowers operational overhead.

"Not all tokens are created equal. The real value lies not in measuring tokens generated, but in the quality of intelligence delivered per unit of energy consumed."

SambaNova — "Intelligence per Joule"

Advanced Model Capabilities

Massive Model Support

Run large models with hundreds of billions of parameters on a single rack. Support for Composition of Experts (CoE) with 100+ expert models hosted simultaneously.

100s of B paramsCoE support100+ models

Long Context Windows

Handle up to 164K token context windows (MiniMax M2.5) and 128K on most other models. Massive memory capacity enables document analysis, code generation, and reasoning tasks.

Up to 164K tokensSingle nodeNo truncation

Millisecond Model Switching

Multiple models resident in memory simultaneously with millisecond switching latency — orders of magnitude faster than GPU systems. Perfect for agentic AI and multi-model workflows.

ms switchingMulti-modelAgentic AI

European Infrastructure

Hosted in Equinix Munich 4 — Tier III+ certified, carrier-neutral datacenter

Row of SambaNova racks powering Infercom's EU infrastructure

SambaNova rack infrastructure — air-cooled, 10kW per rack

Munich-Based Hosting

All data and processing remains within German borders under EU jurisdiction.

No US Jurisdiction

Protection from CLOUD Act, PATRIOT Act, and foreign intelligence access.

AI Act Ready

Infrastructure prepared for EU AI Act requirements and compliance.

Tier III+ Certified

99.982% uptime guarantee with redundant power and cooling systems.

Learn More About the Technology

Explore in-depth resources from SambaNova and independent sources

SambaNova Blog

Revolutionary AI Infrastructure

Dataflow vs. GPU Architecture

Dataflow Architecture

Traditional GPU

SN40L Reconfigurable Dataflow Unit

Three-Tier Memory per Chip

Rack Configuration (SN40L-16)

World-Record Performance

MiniMax M2.5

DeepSeek-R1 671B

DeepSeek-V3.1

gpt-oss-120b

Sustainable AI Infrastructure

Lower Power Consumption

Smaller Footprint

Air-Cooled Design

Advanced Model Capabilities

Massive Model Support

Long Context Windows

Millisecond Model Switching

European Infrastructure

Munich-Based Hosting

No US Jurisdiction

AI Act Ready

Tier III+ Certified

Learn More About the Technology

Intelligence per Joule

Dataflow Architecture for Inference

Why SN40L Is Best for Inference

SambaNova & Infercom Partnership

SN40L Architecture Paper

Independent Performance Benchmarks

Ready to Build the Future of AI in Europe?