Latency

Simple Definition

Latency is how long it takes for an AI system to start responding after you send a request. It’s the gap between “I clicked send” and “the answer starts appearing.”

Low latency = fast. High latency = slow.

In AI conversations and real-time tools, latency is one of the most noticeable quality-of-life factors.

Two Types of AI Latency

Time to First Token (TTFT): How long until the model starts outputting anything. This is what feels like “thinking time” when you’re waiting for a response to begin.

Tokens Per Second (TPS): How fast the model generates tokens once it starts. A model might have low TTFT but generate text slowly, or start fast but stream smoothly.

For chat applications, TTFT matters most. For batch tasks where you’re waiting for a full response, overall generation speed matters more.

What Causes High Latency

  • Model size — larger models take longer to run
  • Server load — shared infrastructure during peak hours slows responses
  • Long inputs — more input tokens to process = longer wait
  • Network distance — physically distant servers add delay
  • Cold starts — serverless AI deployments may take extra time to “wake up”

Latency vs. Throughput

These are related but different:

  • Latency — how fast one request gets answered
  • Throughput — how many requests can be handled at the same time

A system can have low latency for individual requests but struggle with throughput under heavy load, or vice versa.

Why Latency Matters for Different Use Cases

Use caseLatency importance
Live chatbotCritical — users notice every second
Voice AIExtremely critical — must feel real-time
Document summarizationLess critical — waiting is acceptable
Batch data processingNot critical — running overnight is fine
  • Inference — the computation that determines latency
  • LLM — larger models tend to have higher latency
  • SLM — smaller models with lower latency
  • API — where latency is most commonly measured in practice
  • Quantization — can reduce latency by making models lighter

Continue learning

Explore related guides, tools, workflows, and prompts that help you go deeper into this topic.

See AI terms in action

Browse practical AI workflows that use the concepts in this glossary.

Last updated: