Local LLM

Simple Definition

A local LLM is an AI language model that runs on your own computer — not on a company’s server in the cloud. You download the model, run it locally, and your data never leaves your machine.

Why Run an LLM Locally?

Privacy — Your inputs and outputs stay on your device. Nothing is sent to a cloud provider or used for training.

Offline access — Works without an internet connection.

Cost — No subscription or per-use API fees after the initial setup.

Control — You choose which model to use and how it behaves.

Speed (sometimes) — On a fast computer with a capable GPU, local inference can be quicker than waiting for a cloud API.

When Local LLMs Are Practical

Running a local LLM requires meaningful hardware — typically a modern computer with a capable GPU or enough RAM (8GB minimum, 16GB or more for larger models). This makes them:

  • Practical for developers, power users, and privacy-conscious professionals
  • Less practical for most people on average laptops (though this is improving rapidly)
  • Ollama — the simplest way to run models locally; handles setup automatically
  • LM Studio — a user-friendly desktop app for running local models
  • Jan — open-source desktop AI app
  • llama.cpp — command-line tool for running Llama-family models efficiently
  • Llama 3 (Meta) — widely used, available in several sizes
  • Mistral — efficient models that run well on consumer hardware
  • Phi (Microsoft) — small but capable models
  • Gemma (Google) — lightweight, designed for local use
  • LLM — what a local LLM is
  • On-Device AI — the broader concept of AI running on your own hardware
  • Open Weights — what makes local LLMs possible — publicly available model weights

Continue learning

Explore related guides, tools, workflows, and prompts that help you go deeper into this topic.

See AI terms in action

Browse practical AI workflows that use the concepts in this glossary.

Last updated: