OpenAI Quality, Open-Weight Freedom
gpt-oss-120b is OpenAI's first open-weight model — Apache 2.0 licensed, designed for production agentic workloads. It's not the flashiest model, but it's the one you can rely on day after day.
Built-in Reasoning
Chain-of-thought reasoning with adjustable effort levels — optimize for speed or accuracy per task.
Production-Ready
Matches GPT-4o on most tasks. Beats it on reasoning-heavy benchmarks.
Best Value
Best price-to-intelligence ratio per Artificial Analysis.
Efficient by Design
| Total Parameters | 117B |
| Active Parameters | 5.1B per forward pass |
| Architecture | Mixture of Experts (MoE) |
| Experts | 128 experts, Top-4 routing per token |
| Layers | 36 |
| Context Length | 131K tokens |
| License | Apache 2.0 |
Measured on EU Infrastructure
10K input / 1K output, 1 concurrent, 10 requests
Up to 772 tok/s on shorter prompts. Last measured: April 2026.
Why It's So Fast
The MoE architecture means you get 117B model quality while only running 5.1B parameters per request — that's why it's so fast.
- 22x fewer active parameters per inference
- Lower memory bandwidth requirements
- Expert routing optimized for each token
- Same quality, fraction of the compute
Not for Developers. For Agents.
"If you're building a public-facing AI agent, gpt-oss is your best bet — it's the best privately hostable model that functions on a single high-end GPU in production."
— Tigris
Reasoning Control
Adjust thinking effort (low/medium/high) per task
Function Calling
Native tool use for agentic workflows
Structured Outputs
JSON mode for reliable parsing
Web Browsing
Built-in capability for research agents
Navigate websites, extract data, and perform multi-step research tasks autonomously.
Code Execution
Python execution for data analysis agents
Run Python in a sandboxed environment for data processing, calculations, and analysis.
The Right Model for the Right Task
Not every request needs your most expensive model. Smart teams use gpt-oss-120b as part of a multi-model strategy.
"The technical quality is undeniable, and the chain-of-thought reasoning system is genuinely innovative in the open-weight space."
— Apatero (2026 Review)
Balanced Mode
In balanced mode: Matches GPT-4o on most tasks
Deep Mode
In deep mode: Beats GPT-4o on reasoning (MATH, HumanEval)
Cost Efficiency
At a fraction of the cost of proprietary models
| Scenario | Model Choice |
|---|---|
| Complex reasoning | gpt-oss-120b (high effort) |
| Standard tasks | gpt-oss-120b (medium effort) |
| Simple queries | gpt-oss-120b (low effort) |
| Premium tasks | MiniMax M2.5 |
"We optimized workflows twice: once for accuracy + latency, and once for accuracy + cost—capturing the tradeoffs that matter most in real-world deployments."
— DataRobot
OpenAI Open-Weight on EU Infrastructure
Run OpenAI's open-weight model without sending data to the US:
- Hosted in Germany on Infercom-owned infrastructure
- Full GDPR compliance with EU-based DPA
- No US CLOUD Act exposure
- ISO 27001 certified
- Apache 2.0 license — full freedom to deploy
Start Building in Minutes
from openai import OpenAI
client = OpenAI(
api_key="your-infercom-key",
base_url="https://api.infercom.ai/v1"
)
response = client.chat.completions.create(
model="gpt-oss-120b",
messages=[{"role": "user", "content": "Your prompt here"}],
max_tokens=4096
)
print(response.choices[0].message.content)