How Do You Want Your Inference?
Start with our inference service and scale to dedicated capacity or on-premises as your needs grow.
All options include a fully managed, OpenAI-compatible API. We handle model deployments, infrastructure, and updates — you just call the endpoint.
How Pricing Works
AI inference is priced per token. A token is roughly 4 characters in English. You pay for what you use — no minimums, no commitments.
What is a token?
Tokens are the basic units LLMs process. In English, 1 token ≈ 4 characters or ¾ of a word. 1,000 words ≈ 1,300 tokens. Other languages may use more tokens per character.
Input vs Output
You pay separately for input (your prompt) and output (the model's response). Output tokens typically cost more because they require more computation to generate.
Choosing a model
Larger models (like DeepSeek V3.1) are more capable but cost more. Smaller models (like gpt-oss-120b) are fast and cheap for simpler tasks. EU-hosted models guarantee data sovereignty.
Access Tiers
Pay-as-you-go
Production-ready rate limits. Per-token pricing in EUR.
View rate limits →EU Sovereign Models
Full GDPR compliance, no US CLOUD Act exposure
| Model | Input/1M | Output/1M | Context |
|---|---|---|---|
| gpt-oss-120b | €0.22 | €0.59 | 128K |
| MiniMax-M2.5 | €0.30 | €1.20 | 164K |
| DeepSeek-V3.1 | €3.00 | €4.50 | 128K |
Prices in EUR, excl. VAT. EU-hosted models include full data sovereignty.
Global Model Catalog
Additional models via global infrastructure
| Model | Input/1M | Output/1M | Region |
|---|---|---|---|
| Llama 3.1 8B | €0.10 | €0.20 | US |
| Qwen3 32B | €0.40 | €0.80 | JP |
| Qwen3 235B | €0.40 | €0.80 | JP |
| Llama 3.3 70B | €0.60 | €1.20 | US |
| DeepSeek R1 Distill 70B | €0.70 | €1.40 | JP |
| Llama 4 Maverick | €0.63 | €1.80 | JP |
| DeepSeek V3 | €3.00 | €4.50 | US |
| DeepSeek V3.1 Terminus | €3.00 | €4.50 | US |
| DeepSeek V3.2 | €3.00 | €4.50 | US |
| DeepSeek R1 | €5.00 | €7.00 | US |
Prices in EUR, excl. VAT. Requests processed on global infrastructure outside the EU.
Pricing Calculator
≈ 250 tokens
≈ 125 tokens
Cost per request
€0.000129
With €5 credit: 38,834 requests
Cost breakdown
Estimate only. Token counts are approximate (~4 characters per token for English). Actual tokens vary by language, content type, and model tokenizer. For precise costs, check your usage in the cloud portal after making API calls.
Compare Features Across Tiers
All tiers include EU sovereignty by default. Scale up as your needs grow.
| Feature | Inference Service | Dedicated | On-Premises |
|---|---|---|---|
| EU-Hosted by Default | Your Location | ||
| Pricing Model | Pay-per-token | Reserved capacity | Custom licensing |
| Rate Limits | Per plan | Hardware only | Unlimited |
| Model CatalogStandard models: Our curated selection of production-ready models. Custom: Bring your own fine-tuned models based on supported architectures. | Standard models | Standard + Custom | Any model |
| Custom Model Hosting | |||
| Support | Docs & Community to Priority | Priority | Dedicated 24/7 |
| SLA Guarantee | Best effort to Custom | Custom | Custom |
| Air-Gapped Deployment | |||
| Data Residency ControlEU default: EU-hosted models keep data in EU; Global Catalog models are processed outside EU. EU guaranteed: All processing contractually stays in EU. Your choice: You control where hardware is located. | EU default | EU guaranteed | Your choice |
| Best For | Prototyping to production | High-volume production | Full physical ownership |
The Infercom Advantage
Transparent, fair, and built for European organizations.
EU Sovereignty by Default
EU-hosted models process all data in our Munich datacenter — full GDPR compliance, no US CLOUD Act exposure. Choose Global Catalog models only when you explicitly need them.
Transparent Token Pricing
Clear per-token pricing in EUR with no hidden fees. What you see is what you pay. No surprise charges for data transfer or API calls.
No Performance Throttling
Every request runs at full inference speed regardless of your plan — we never reduce token throughput or deprioritize pay-as-you-go users. Rate limits cap request frequency, not performance.
Clear Upgrade Path
Start with pay-as-you-go, scale to dedicated capacity, deploy on-premises. Move between tiers as your needs evolve.