gpt-oss-120b - The Production Workhorse

Built for agents, not editors.

OpenAI's first open-weight model, running at 700+ tokens/sec on EU infrastructure. Reliable performance without flagship costs. Best price-to-intelligence ratio.

OpenAI Quality, Open-Weight Freedom

gpt-oss-120b is OpenAI's first open-weight model - Apache 2.0 licensed, designed for production agentic workloads. It's not the flashiest model, but it's the one you can rely on day after day.

Built-in Reasoning

Chain-of-thought reasoning with adjustable effort levels - optimize for speed or accuracy per task.

Production-Ready

Matches GPT-4o on most tasks. Beats it on reasoning-heavy benchmarks.

Best Value

Best price-to-intelligence ratio per Artificial Analysis.

Efficient by Design

Total Parameters	117B
Active Parameters	5.1B per forward pass
Architecture	Mixture of Experts (MoE)
Experts	128 experts, Top-4 routing per token
Layers	36
Context Length	131K tokens
License	Apache 2.0

EU HostedFastest Model

Measured on EU Infrastructure

Output Throughput

713tok/s

Time to First Token

388ms

End-to-End Latency

1.789s

Context Length

131Ktokens

10K input / 1K output, 1 concurrent, 10 requests

Up to 772 tok/s on shorter prompts. Last measured: April 2026.

Why It's So Fast

The MoE architecture means you get 117B model quality while only running 5.1B parameters per request - that's why it's so fast.

22x fewer active parameters per inference
Lower memory bandwidth requirements
Expert routing optimized for each token
Same quality, fraction of the compute

Run Your Own Benchmark

Not for Developers. For Agents.

"If you're building a public-facing AI agent, gpt-oss is your best bet - it's the best privately hostable model that functions on a single high-end GPU in production."
- Tigris

Reasoning Control

Adjust thinking effort (low/medium/high) per task

Function Calling

Native tool use for agentic workflows

Structured Outputs

JSON mode for reliable parsing

Web Browsing

Built-in capability for research agents

Navigate websites, extract data, and perform multi-step research tasks autonomously.

Code Execution

Python execution for data analysis agents

Run Python in a sandboxed environment for data processing, calculations, and analysis.

The Right Model for the Right Task

Not every request needs your most expensive model. Smart teams use gpt-oss-120b as part of a multi-model strategy.

"The technical quality is undeniable, and the chain-of-thought reasoning system is genuinely innovative in the open-weight space."
- Apatero (2026 Review)

Balanced Mode

In balanced mode: Matches GPT-4o on most tasks

Deep Mode

In deep mode: Beats GPT-4o on reasoning (MATH, HumanEval)

Cost Efficiency

At a fraction of the cost of proprietary models

Scenario	Model Choice
Complex reasoning	gpt-oss-120b (high effort)
Standard tasks	gpt-oss-120b (medium effort)
Simple queries	gpt-oss-120b (low effort)
Premium tasks	MiniMax M2.7 Ultraspeed

"We optimized workflows twice: once for accuracy + latency, and once for accuracy + cost-capturing the tradeoffs that matter most in real-world deployments."
- DataRobot

OpenAI Open-Weight on EU Infrastructure

Run OpenAI's open-weight model without sending data to the US:

Hosted in Germany on Infercom-owned infrastructure
Full GDPR compliance with EU-based DPA
No US CLOUD Act exposure
ISO 27001 certified
Apache 2.0 license - full freedom to deploy

ISO 27001 Certified

GDPR Compliant

German Datacenter

Apache 2.0 Licensed

Start Building in Minutes

quickstart.py

from openai import OpenAI

client = OpenAI(
    api_key="your-infercom-key",
    base_url="https://api.infercom.ai/v1"
)

response = client.chat.completions.create(
    model="gpt-oss-120b",
    messages=[{"role": "user", "content": "Your prompt here"}],
    max_tokens=4096
)

print(response.choices[0].message.content)

OpenAI-compatible API. Drop-in replacement for your existing code.

€5 free credit. No credit card required.

gpt-oss-120b - The Production Workhorse

OpenAI Quality, Open-Weight Freedom

Built-in Reasoning

Production-Ready

Best Value

Efficient by Design

Measured on EU Infrastructure

Why It's So Fast

Not for Developers. For Agents.

The Right Model for the Right Task

OpenAI Open-Weight on EU Infrastructure

Start Building in Minutes

Learn More

Performance Benchmarks

MiniMax M2.7 Ultraspeed

API Documentation

Pricing

Ready to Build the Future of AI in Europe?