Gemma 4 31B
Google's Most Capable Dense Open Model
Frontier-class reasoning, native multimodal capabilities, and production-grade coding performance. Built from the same research foundation as Gemini 3, now running on EU sovereign infrastructure.
Why Gemma 4 31B
Google DeepMind's most capable dense open model combines advanced reasoning with multimodal understanding. Ideal for agentic workflows requiring both speed and intelligence.
Advanced Reasoning
Configurable thinking mode for multi-step planning and complex problem-solving. Toggle reasoning depth based on whether your workload needs deep deliberation or fast turnaround.
Native Multimodal
Process text and images together for document understanding, visual analysis, chart extraction, and structured data output. Perfect for vision-plus-reasoning workflows.
Agentic Workflows
Native function-calling, structured JSON output, and system-prompt support. Build autonomous agents that reliably interact with tools and APIs using frameworks like OpenClaw and CrewAI.
Configurable Thinking
Toggle thinking mode on or off depending on task requirements. Enable for complex reasoning, disable for latency-sensitive applications needing fast responses.
31B
Parameters (Dense)
128K
Context Window
30%+
Faster on Infercom
vs. next fastest provider (Artificial Analysis)
Benchmark Performance
Frontier-class scores across reasoning, coding, and knowledge benchmarks. All scores from Google DeepMind evaluation.
MMLU Pro
85.2%
Advanced knowledge reasoning
AIME 2026
89.2%
Mathematical reasoning (no tools)
LiveCodeBench v6
80.0%
Production coding tasks
GPQA Diamond
84.3%
Graduate-level science QA
Codeforces ELO
2150
Competitive programming
When to Use Gemma 4 31B
Gemma 4 excels at tasks requiring reasoning, vision, or agentic capabilities. The dense architecture enables efficient fine-tuning and deployment.
Code Assistant
Production-Grade Coding
Transform any workstation into a frontier-class code assistant. Strong performance on LiveCodeBench and Codeforces benchmarks makes Gemma 4 ideal for agentic coding workflows with Claude Code or similar tools.
Learn moreDocument Processing
Vision + Reasoning
Extract structured data from charts, documents, and screenshots. Combine visual understanding with reasoning to return clean JSON output for automated workflows.
Agentic AI
Autonomous Agents
Native function-calling and tool use support enables building autonomous agents that interact with APIs and external services. Compatible with OpenClaw, CrewAI, and other multi-agent frameworks.
Complex Tasks
Mathematical & Scientific Reasoning
89.2% on AIME 2026 mathematical reasoning and 84.3% on GPQA Diamond scientific QA. Enable thinking mode for complex multi-step problems requiring deep deliberation.
Thinking Mode: When to Enable
Thinking On
Complex reasoning tasks, mathematical problems, multi-step planning, code architecture decisions. Worth the extra latency for accuracy.
Thinking Off
Latency-sensitive applications, simple queries, high-throughput pipelines, real-time interactions. Fast turnaround without deliberation overhead.
How to enable thinking mode
response = client.chat.completions.create(
model="gemma-4-31B-it",
messages=[{"role": "user", "content": "Your prompt"}],
extra_body={"chat_template_kwargs": {"enable_thinking": True}},
)Set enable_thinking to true via chat_template_kwargs. With the OpenAI SDK, pass it inside extra_body; with direct API calls, place it at the top level. Reasoning documentation
Pricing
Apache 2.0 licensed with transparent, usage-based pricing. No hidden fees.
| Model | Input (per 1M) | Output (per 1M) | Context |
|---|---|---|---|
| Gemma 4 31B (Infercom) | €0.20 | €0.35 | 128K |
Prices in EUR excl. VAT. EU sovereign deployment with full GDPR compliance.
EU Sovereign Deployment
Gemma 4 31B runs on Infercom's dedicated EU infrastructure. Your data never leaves European jurisdiction.
- Hosted in Germany (Equinix Munich 4)
- Full GDPR compliance with EU-based DPO
- No US CLOUD Act exposure
- ISO 27001 certified infrastructure
- Data processing agreement available
Get Started with Gemma 4
from openai import OpenAI
client = OpenAI(
api_key="your-infercom-key",
base_url="https://api.infercom.ai/v1"
)
response = client.chat.completions.create(
model="gemma-4-31B-it",
messages=[{"role": "user", "content": "Your prompt here"}],
max_tokens=4096
)
print(response.choices[0].message.content)Frequently Asked Questions
What is Gemma 4 31B?
Gemma 4 31B is Google DeepMind's most capable dense open model, built from the same research foundation as Gemini 3. It features 31 billion parameters, a 128K context window, native multimodal capabilities (text and vision), and configurable thinking mode for complex reasoning tasks.
How does Gemma 4 compare to Gemma 3?
Gemma 4 represents a significant leap over Gemma 3 with frontier-class benchmark scores: 85.2% on MMLU Pro, 89.2% on AIME 2026 mathematical reasoning, and 80% on LiveCodeBench v6. It adds native multimodal capabilities, configurable thinking mode, and improved agentic workflow support with native function-calling.
Is Gemma 4 multimodal?
Yes. Gemma 4 31B natively processes both text and images in the same context. This enables document understanding, visual analysis, chart extraction, and structured data output from images without requiring separate vision models.
What is thinking mode and how do I enable it?
Thinking mode is a configurable feature that enables deeper reasoning for complex tasks. When enabled, Gemma 4 deliberates before answering multi-step problems, mathematical reasoning, and code architecture decisions. Enable it by passing enable_thinking: true via chat_template_kwargs (inside extra_body with the OpenAI SDK, or at the top level for direct API calls). For latency-sensitive applications, leave it off for faster responses.
Is my data stored in the EU?
Yes. Infercom runs Gemma 4 31B on dedicated infrastructure in Germany (Equinix Munich 4). Your data never leaves European jurisdiction, with full GDPR compliance, no US CLOUD Act exposure, and ISO 27001 certified infrastructure. A data processing agreement is available on request.