Latency
Simple Definition
Latency is how long it takes for an AI system to start responding after you send a request. It’s the gap between “I clicked send” and “the answer starts appearing.”
Low latency = fast. High latency = slow.
In AI conversations and real-time tools, latency is one of the most noticeable quality-of-life factors.
Two Types of AI Latency
Time to First Token (TTFT): How long until the model starts outputting anything. This is what feels like “thinking time” when you’re waiting for a response to begin.
Tokens Per Second (TPS): How fast the model generates tokens once it starts. A model might have low TTFT but generate text slowly, or start fast but stream smoothly.
For chat applications, TTFT matters most. For batch tasks where you’re waiting for a full response, overall generation speed matters more.
What Causes High Latency
- Model size — larger models take longer to run
- Server load — shared infrastructure during peak hours slows responses
- Long inputs — more input tokens to process = longer wait
- Network distance — physically distant servers add delay
- Cold starts — serverless AI deployments may take extra time to “wake up”
Latency vs. Throughput
These are related but different:
- Latency — how fast one request gets answered
- Throughput — how many requests can be handled at the same time
A system can have low latency for individual requests but struggle with throughput under heavy load, or vice versa.
Why Latency Matters for Different Use Cases
| Use case | Latency importance |
|---|---|
| Live chatbot | Critical — users notice every second |
| Voice AI | Extremely critical — must feel real-time |
| Document summarization | Less critical — waiting is acceptable |
| Batch data processing | Not critical — running overnight is fine |
Related Terms
- Inference — the computation that determines latency
- LLM — larger models tend to have higher latency
- SLM — smaller models with lower latency
- API — where latency is most commonly measured in practice
- Quantization — can reduce latency by making models lighter
Continue learning
Explore related guides, tools, workflows, and prompts that help you go deeper into this topic.
Browse all AI terms.
Learn termSee these concepts in practice.
Open workflowA simple explanation of this AI concept.
Learn termA simple explanation of this AI concept.
Learn termA simple explanation of this AI concept.
Learn termA simple explanation of this AI concept.
Learn termSee AI terms in action
Browse practical AI workflows that use the concepts in this glossary.
Last updated: