gpt-oss-120b

gpt-oss-120b — The Production Workhorse

Built for agents, not editors.

OpenAI's first open-weight model, running at 700+ tokens/sec on EU infrastructure. Reliable performance without flagship costs. Best price-to-intelligence ratio.

OpenAI Quality, Open-Weight Freedom

gpt-oss-120b is OpenAI's first open-weight model — Apache 2.0 licensed, designed for production agentic workloads. It's not the flashiest model, but it's the one you can rely on day after day.

Built-in Reasoning

Chain-of-thought reasoning with adjustable effort levels — optimize for speed or accuracy per task.

Production-Ready

Matches GPT-4o on most tasks. Beats it on reasoning-heavy benchmarks.

Best Value

Best price-to-intelligence ratio per Artificial Analysis.

Efficient by Design

Total Parameters117B
Active Parameters5.1B per forward pass
ArchitectureMixture of Experts (MoE)
Experts128 experts, Top-4 routing per token
Layers36
Context Length131K tokens
LicenseApache 2.0
EU HostedFastest Model

Measured on EU Infrastructure

Output Throughput
713tok/s
Time to First Token
388ms
End-to-End Latency
1.789s
Context Length
131Ktokens

10K input / 1K output, 1 concurrent, 10 requests

Up to 772 tok/s on shorter prompts. Last measured: April 2026.

Why It's So Fast

The MoE architecture means you get 117B model quality while only running 5.1B parameters per request — that's why it's so fast.

  • 22x fewer active parameters per inference
  • Lower memory bandwidth requirements
  • Expert routing optimized for each token
  • Same quality, fraction of the compute

Not for Developers. For Agents.

"If you're building a public-facing AI agent, gpt-oss is your best bet — it's the best privately hostable model that functions on a single high-end GPU in production."

Tigris

Reasoning Control

Adjust thinking effort (low/medium/high) per task

Function Calling

Native tool use for agentic workflows

Structured Outputs

JSON mode for reliable parsing

Web Browsing

Built-in capability for research agents

Navigate websites, extract data, and perform multi-step research tasks autonomously.

Code Execution

Python execution for data analysis agents

Run Python in a sandboxed environment for data processing, calculations, and analysis.

The Right Model for the Right Task

Not every request needs your most expensive model. Smart teams use gpt-oss-120b as part of a multi-model strategy.

"The technical quality is undeniable, and the chain-of-thought reasoning system is genuinely innovative in the open-weight space."

Apatero (2026 Review)

Balanced Mode

In balanced mode: Matches GPT-4o on most tasks

Deep Mode

In deep mode: Beats GPT-4o on reasoning (MATH, HumanEval)

Cost Efficiency

At a fraction of the cost of proprietary models

ScenarioModel Choice
Complex reasoninggpt-oss-120b (high effort)
Standard tasksgpt-oss-120b (medium effort)
Simple queriesgpt-oss-120b (low effort)
Premium tasksMiniMax M2.5

"We optimized workflows twice: once for accuracy + latency, and once for accuracy + cost—capturing the tradeoffs that matter most in real-world deployments."

DataRobot

OpenAI Open-Weight on EU Infrastructure

Run OpenAI's open-weight model without sending data to the US:

  • Hosted in Germany on Infercom-owned infrastructure
  • Full GDPR compliance with EU-based DPA
  • No US CLOUD Act exposure
  • ISO 27001 certified
  • Apache 2.0 license — full freedom to deploy
ISO 27001 Certified
GDPR Compliant
German Datacenter
Apache 2.0 Licensed

Start Building in Minutes

quickstart.py
from openai import OpenAI

client = OpenAI(
    api_key="your-infercom-key",
    base_url="https://api.infercom.ai/v1"
)

response = client.chat.completions.create(
    model="gpt-oss-120b",
    messages=[{"role": "user", "content": "Your prompt here"}],
    max_tokens=4096
)

print(response.choices[0].message.content)

OpenAI-compatible API. Drop-in replacement for your existing code.

€5 free credit. No credit card required.

Ready to Build the Future of AI in Europe?

Join forward-thinking organizations deploying sovereign AI with world-class performance