How Do You Want Your Inference?

Start with our inference service and scale to dedicated capacity or on-premises as your needs grow.

€5 Free Credit to StartEUR Native PricingNo Hidden Fees or Minimums

All options include a fully managed, OpenAI-compatible API. We handle model deployments, infrastructure, and updates - you just call the endpoint.

How Pricing Works

AI inference is priced per token. A token is roughly 4 characters in English. You pay for what you use - no minimums, no commitments.

What is a token?

Tokens are the basic units LLMs process. In English, 1 token ≈ 4 characters or ¾ of a word. 1,000 words ≈ 1,300 tokens. Other languages may use more tokens per character.

Input vs Output

You pay separately for input (your prompt) and output (the model's response). Output tokens typically cost more because they require more computation to generate.

Choosing a model

Larger models (like DeepSeek V3.1) are more capable but cost more. Smaller models (like gpt-oss-120b) are fast and cheap for simpler tasks. EU-hosted models guarantee data sovereignty.

Access Tiers

FreeSelf-Service

€5 credit

No credit card required. Standard rate limits.

View rate limits →

DeveloperSelf-Service

Pay-as-you-go

Production-ready rate limits. Per-token pricing in EUR.

View rate limits →

EnterpriseCustom pricing

SLA, priority support, custom rate limits, and add-ons.

Contact sales →View security credentials →

🇪🇺

EU Sovereign Models

Full GDPR compliance, no US CLOUD Act exposure

Model	Input/1M	Output/1M	Context
gpt-oss-120b	€0.22	€0.59	128K
Gemma 3 12B	€0.20	€0.35	128K
MiniMax-M2.5	€0.30	€1.20	160K
MiniMax M2.7 Ultraspeed	€0.60	€2.40	192K

Prices in EUR, excl. VAT. EU-hosted models include full data sovereignty.

🌐

Global Model Catalog

Additional models via global infrastructure

Model	Input/1M	Output/1M	Region
Llama 3.3 70B	€0.60	€1.20	JP
DeepSeek V3.1	€3.00	€4.50	JP
DeepSeek V3.2	€3.00	€4.50	JP

Prices in EUR, excl. VAT. Requests processed on global infrastructure outside the EU.

Pricing Calculator

Select Model

Input (characters)

≈ 250 tokens

Output (characters)

≈ 125 tokens

Cost per request

€0.000129

With €5 credit: 38,834 requests

EU Sovereign - data stays in EU

Cost breakdown

Input: €0.000055Output: €0.000074Rate: €0.22/€0.59 per 1M tokens

Estimate only. Token counts are approximate (~4 characters per token for English). Actual tokens vary by language, content type, and model tokenizer. For precise costs, check your usage in the cloud portal after making API calls.

Compare Features Across Tiers

All tiers include EU sovereignty by default. Scale up as your needs grow.

Feature	Inference Service	Dedicated	On-Premises
EU-Hosted by Default			Your Location
Pricing Model	Pay-per-token	Reserved capacity	Custom licensing
Rate Limits	Per plan	Hardware only	Unlimited
Model CatalogStandard models: Our curated selection of production-ready models. Custom: Bring your own fine-tuned models based on supported architectures.	Standard models	Standard + Custom	Any model
Custom Model Hosting
Support	Docs & Community to Priority	Priority	Dedicated 24/7
SLA Guarantee	Best effort to Custom	Custom	Custom
Air-Gapped Deployment
Data Residency ControlEU default: EU-hosted models keep data in EU; Global Catalog models are processed outside EU. EU guaranteed: All processing contractually stays in EU. Your choice: You control where hardware is located.	EU default	EU guaranteed	Your choice
Best For	Prototyping to production	High-volume production	Full physical ownership

The Infercom Advantage

Transparent, fair, and built for European organizations.

EU Sovereignty by Default

EU-hosted models process all data in our Munich datacenter - full GDPR compliance, no US CLOUD Act exposure. Choose Global Catalog models only when you explicitly need them.

Transparent Token Pricing

Clear per-token pricing in EUR with no hidden fees. What you see is what you pay. No surprise charges for data transfer or API calls.

No Performance Throttling

Every request runs at full inference speed regardless of your plan - we never reduce token throughput or deprioritize pay-as-you-go users. Rate limits cap request frequency, not performance.

Clear Upgrade Path

Start with pay-as-you-go, scale to dedicated capacity, deploy on-premises. Move between tiers as your needs evolve.