Per-request speed: what benchmarks report

When independent benchmarks like Artificial Analysis report "output speed," they measure the average number of tokens received per second after the first token arrives - a single request, with time-to-first-token deliberately excluded. This is the number that determines user experience: how fast the response streams.

On our EU infrastructure we measure per-request output throughput of 713 tok/s on gpt-oss-120b and 428 tok/s on MiniMax M2.7 Ultraspeed (p50, 10,000-token input, single request). For comparability across models with different tokenizers, Artificial Analysis standardizes all speed metrics in OpenAI tokens counted with the tiktoken o200k_base tokenizer.

System throughput: what providers' economics depend on

The second meaning is aggregate throughput: the total tokens a system produces per second across all concurrent requests. Benchmarking literature distinguishes these explicitly as "TPS per user" versus "TPS per system." A GPU server might deliver 30 tok/s to each of 100 concurrent users - 3,000 tok/s of system throughput, but a much slower experience per request.

This distinction is where speed claims get murky: a vendor can truthfully advertise thousands of tokens per second while each individual request crawls. When you see a tok/s figure, the first question to ask is: per request, or across the whole system?

How to read a tok/s claim

Check three things: whether it is per-request or aggregate, whether TTFT is included or excluded (Artificial Analysis excludes it by definition), and the workload shape - prompt length and output length both change the number. Our published benchmarks state all three: per-request, server-side p50, 10,000 input / 1,000 output tokens.

Sources

Related terms

Throughput (LLM Serving)

Tokens per second in two senses: per-request output throughput vs. system-wide capacity - and how batching trades one against the other.

Inter-Token Latency (ITL)

The average time gap between consecutive tokens during generation - also called TPOT.

Inference Speed

The umbrella term: TTFT, inter-token latency, and throughput - and which one matters when.

See these metrics measured live on our EU infrastructure - real numbers from production hardware, independently verified.

Tokens per Second

Per-request speed: what benchmarks report

System throughput: what providers' economics depend on

How to read a tok/s claim

Sources

Related terms

Ready to Build the Future of AI in Europe?