Gemma 4 31B

Google's Most Capable Dense Open Model

Frontier-class reasoning, native multimodal capabilities, and production-grade coding performance. Built from the same research foundation as Gemini 3, now running on EU sovereign infrastructure.

Why Gemma 4 31B

Google DeepMind's most capable dense open model combines advanced reasoning with multimodal understanding. Ideal for agentic workflows requiring both speed and intelligence.

Advanced Reasoning

Configurable thinking mode for multi-step planning and complex problem-solving. Toggle reasoning depth based on whether your workload needs deep deliberation or fast turnaround.

Native Multimodal

Process text and images together for document understanding, visual analysis, chart extraction, and structured data output. Perfect for vision-plus-reasoning workflows.

Agentic Workflows

Native function-calling, structured JSON output, and system-prompt support. Build autonomous agents that reliably interact with tools and APIs using frameworks like OpenClaw and CrewAI.

Configurable Thinking

Toggle thinking mode on or off depending on task requirements. Enable for complex reasoning, disable for latency-sensitive applications needing fast responses.

31B

Parameters (Dense)

128K

Context Window

30%+

Faster on Infercom

vs. next fastest provider (Artificial Analysis)

Benchmark Performance

Frontier-class scores across reasoning, coding, and knowledge benchmarks. All scores from Google DeepMind evaluation.

MMLU Pro

85.2%

Advanced knowledge reasoning

AIME 2026

89.2%

Mathematical reasoning (no tools)

LiveCodeBench v6

80.0%

Production coding tasks

GPQA Diamond

84.3%

Graduate-level science QA

Codeforces ELO

2150

Competitive programming

When to Use Gemma 4 31B

Gemma 4 excels at tasks requiring reasoning, vision, or agentic capabilities. The dense architecture enables efficient fine-tuning and deployment.

Code Assistant

Production-Grade Coding

Transform any workstation into a frontier-class code assistant. Strong performance on LiveCodeBench and Codeforces benchmarks makes Gemma 4 ideal for agentic coding workflows with Claude Code or similar tools.

Learn more

Document Processing

Vision + Reasoning

Extract structured data from charts, documents, and screenshots. Combine visual understanding with reasoning to return clean JSON output for automated workflows.

Agentic AI

Autonomous Agents

Native function-calling and tool use support enables building autonomous agents that interact with APIs and external services. Compatible with OpenClaw, CrewAI, and other multi-agent frameworks.

Complex Tasks

Mathematical & Scientific Reasoning

89.2% on AIME 2026 mathematical reasoning and 84.3% on GPQA Diamond scientific QA. Enable thinking mode for complex multi-step problems requiring deep deliberation.

Thinking Mode: When to Enable

Thinking On

Complex reasoning tasks, mathematical problems, multi-step planning, code architecture decisions. Worth the extra latency for accuracy.

Thinking Off

Latency-sensitive applications, simple queries, high-throughput pipelines, real-time interactions. Fast turnaround without deliberation overhead.

How to enable thinking mode

response = client.chat.completions.create(
    model="gemma-4-31B-it",
    messages=[{"role": "user", "content": "Your prompt"}],
    extra_body={"chat_template_kwargs": {"enable_thinking": True}},
)

Set enable_thinking to true via chat_template_kwargs. With the OpenAI SDK, pass it inside extra_body; with direct API calls, place it at the top level. Reasoning documentation

Pricing

Apache 2.0 licensed with transparent, usage-based pricing. No hidden fees.

ModelInput (per 1M)Output (per 1M)Context
Gemma 4 31B (Infercom)€0.20€0.35128K

Prices in EUR excl. VAT. EU sovereign deployment with full GDPR compliance.

EU Sovereign Deployment

Gemma 4 31B runs on Infercom's dedicated EU infrastructure. Your data never leaves European jurisdiction.

  • Hosted in Germany (Equinix Munich 4)
  • Full GDPR compliance with EU-based DPO
  • No US CLOUD Act exposure
  • ISO 27001 certified infrastructure
  • Data processing agreement available
ISO 27001
GDPR Compliant
Germany
SambaNova RDUs

Get Started with Gemma 4

quickstart.py
from openai import OpenAI

client = OpenAI(
    api_key="your-infercom-key",
    base_url="https://api.infercom.ai/v1"
)

response = client.chat.completions.create(
    model="gemma-4-31B-it",
    messages=[{"role": "user", "content": "Your prompt here"}],
    max_tokens=4096
)

print(response.choices[0].message.content)

Drop-in OpenAI API compatibility. Change your base URL and start using Gemma 4 in minutes. No code changes required.

Free tier available. Pay-as-you-go with no commitments.

Frequently Asked Questions

What is Gemma 4 31B?

Gemma 4 31B is Google DeepMind's most capable dense open model, built from the same research foundation as Gemini 3. It features 31 billion parameters, a 128K context window, native multimodal capabilities (text and vision), and configurable thinking mode for complex reasoning tasks.

How does Gemma 4 compare to Gemma 3?

Gemma 4 represents a significant leap over Gemma 3 with frontier-class benchmark scores: 85.2% on MMLU Pro, 89.2% on AIME 2026 mathematical reasoning, and 80% on LiveCodeBench v6. It adds native multimodal capabilities, configurable thinking mode, and improved agentic workflow support with native function-calling.

Is Gemma 4 multimodal?

Yes. Gemma 4 31B natively processes both text and images in the same context. This enables document understanding, visual analysis, chart extraction, and structured data output from images without requiring separate vision models.

What is thinking mode and how do I enable it?

Thinking mode is a configurable feature that enables deeper reasoning for complex tasks. When enabled, Gemma 4 deliberates before answering multi-step problems, mathematical reasoning, and code architecture decisions. Enable it by passing enable_thinking: true via chat_template_kwargs (inside extra_body with the OpenAI SDK, or at the top level for direct API calls). For latency-sensitive applications, leave it off for faster responses.

Is my data stored in the EU?

Yes. Infercom runs Gemma 4 31B on dedicated infrastructure in Germany (Equinix Munich 4). Your data never leaves European jurisdiction, with full GDPR compliance, no US CLOUD Act exposure, and ISO 27001 certified infrastructure. A data processing agreement is available on request.

Ready to Build the Future of AI in Europe?

Join forward-thinking organizations deploying sovereign AI with world-class performance