Per-request speed: what benchmarks report
When independent benchmarks like Artificial Analysis report "output speed," they measure the average number of tokens received per second after the first token arrives - a single request, with time-to-first-token deliberately excluded. This is the number that determines user experience: how fast the response streams.
On our EU infrastructure we measure per-request output throughput of 713 tok/s on gpt-oss-120b and 428 tok/s on MiniMax M2.7 Ultraspeed (p50, 10,000-token input, single request). For comparability across models with different tokenizers, Artificial Analysis standardizes all speed metrics in OpenAI tokens counted with the tiktoken o200k_base tokenizer.
System throughput: what providers' economics depend on
The second meaning is aggregate throughput: the total tokens a system produces per second across all concurrent requests. Benchmarking literature distinguishes these explicitly as "TPS per user" versus "TPS per system." A GPU server might deliver 30 tok/s to each of 100 concurrent users - 3,000 tok/s of system throughput, but a much slower experience per request.
This distinction is where speed claims get murky: a vendor can truthfully advertise thousands of tokens per second while each individual request crawls. When you see a tok/s figure, the first question to ask is: per request, or across the whole system?
How to read a tok/s claim
Check three things: whether it is per-request or aggregate, whether TTFT is included or excluded (Artificial Analysis excludes it by definition), and the workload shape - prompt length and output length both change the number. Our published benchmarks state all three: per-request, server-side p50, 10,000 input / 1,000 output tokens.
Sources
Related terms
Throughput (LLM Serving)
Tokens per second in two senses: per-request output throughput vs. system-wide capacity - and how batching trades one against the other.
Inter-Token Latency (ITL)
The average time gap between consecutive tokens during generation - also called TPOT.
Inference Speed
The umbrella term: TTFT, inter-token latency, and throughput - and which one matters when.
See these metrics measured live on our EU infrastructure - real numbers from production hardware, independently verified.