Colorful illustration of AI vocabulary and language concepts

The AI Glossary: LLM and AI Terms Explained

As AI seeps into our work and personal lives, there are a lot of technical terms that are being thrown around without explanation. Some of the language is coming directly from AI researchers and developers, others are buzzwords that have quickly become tech favorites, and then there's simply marketing-invented words and product names that we have no choice but to accept.

For me, any time I come across unfamiliar AI vocabulary, I initially feel like I'm completely behind some big movement. Did I sleep on something? But then I realize I'm swimming in a super niche community who is doing what niche communities do - use acronyms and jargon that are efficient for insiders and also signal "one of us."

But the fact that these niche communities show up in my feeds, or in news stories, then it's usually a sign that something is brewing. So if you really do feel behind, or you want to get ahead of the curve and begin to understand the language of AI, I've put together this glossary as a start.

I've grouped the terms into categories so you can jump around. Use the search box to filter, or click a category to jump to it. If a term isn't here, it's probably because it's either I forgot or I'm still behind the times (this space moves fast). Let me know if you have suggestions to add to this glossary.

The Basics

The Basics

You can skip this section if you're already fairly AI-fluent. But I don't want to assume everyone knows how AI fits in with DL, ML and GenAI and LLMs.

Artificial Intelligence (AI)

Artificial intelligence is a broad term for software that performs tasks we usually associate with human thinking, such as recognizing images, writing text, solving problems, and making decisions. Modern AI is mostly about training machines on huge amounts of data rather than hand-coding rules.

Machine Learning (ML)

Machine learning is a branch of AI where software learns patterns from data instead of being explicitly programmed. If AI is the field, machine learning is the main technique used to get there today.

Deep Learning (DL)

Deep learning is a type of machine learning that uses neural networks with many layers. Most of the AI you use like ChatGPT, image generators, and voice assistants runs on deep learning under the hood.

Neural Network

A neural network is the type of software at the heart of most modern AI systems. It's loosely inspired by the brain, where many small calculation units are arranged in layers, and each unit takes in some numbers, does a small bit of math, and passes its result to the next layer. With enough layers and enough training data, the network can learn to recognize patterns in almost anything: words, images, sounds, you name it.

Generative AI (GenAI)

Generative AI is AI that creates new content like text, images, code, video, and audio,rather than just classifying or predicting.

Large Language Model (LLM)

A large language model is an AI system trained on enormous amounts of text to predict the next token in a sequence. That simple objective turns out to produce something that can write, reason, translate, and code. ChatGPT, Claude, and Gemini are all LLMs.

Small Language Model (SLM)

A small language model is a stripped-down LLM that's typically small enough to run on local or resource-constrained hardware like phones, laptops, or edge devices. They trade raw power for speed, privacy, and the ability to work offline.

GPT (Generative Pre-trained Transformer)

GPT stands for Generative Pre-trained Transformer, the specific implementation of transformer-based LLMs. "Generative" because it produces new text, "pre-trained" because it learns general patterns from massive datasets before any task-specific tuning, and "transformer" for the architecture underneath. The acronym started as a technical description and became OpenAI's brand (GPT-3, GPT-4, GPT-5), but the same underlying approach powers most modern LLMs.

Foundation Model

A foundation model is a large, general-purpose AI model trained on broad data (often text, sometimes code, images, or multimodal data) that can serve as the starting point for many different tasks. ChatGPT, Claude Opus, and Gemini are foundation models. Most AI apps you use are built on top of one, with extra prompting or fine-tuning to specialize it for a narrower job.

Frontier Model

A frontier model is one of the current most-capable AI systems, the ones pushing the edge of what's possible. It's an informal but widely used term for the state of the art.

SOTA (State of the Art)

A SOTA model is one that achieves the best known performance on a specific benchmark or task, like "SOTA on MMLU" or "SOTA on coding evals." More precise than frontier model, which is used loosely for any model at the cutting edge overall. A model can be SOTA on one benchmark without being a frontier model.

Transformer

The transformer is the underlying design or blueprint of nearly every modern LLM. It defines how the model is structured internally and how data flows through it as the model reads input and produces output. It was introduced in a 2017 Google paper called "Attention Is All You Need", and it's the reason modern AI got so good so fast.

Attention Is All You Need

"Attention Is All You Need" is the 2017 Google research paper that introduced the transformer architecture. It's arguably the most important AI paper of the past decade. Every modern LLM, from GPT to Claude to Gemini, descends from the ideas in those eight pages. If you ever want to point to the exact moment the current AI era started, this is it.

How LLMs Actually Work

How LLMs Actually Work

Token

A token is the basic unit an LLM reads and writes. It's usually a word, part of a word, or a character. The sentence "AI is wild" might be about 3-5 tokens depending on the model. When companies charge for API usage or talk about context windows, they're counting in tokens.

Tokenization

Tokenization is the process of breaking input text into tokens so the model can process it. Different models tokenize differently, which is one reason the same prompt behaves slightly differently across AI tools.

Parameters

Parameters are the internal numbers inside a model that were tuned during training. A "70 billion parameter" model has 70 billion of these knobs. More parameters generally means more capability, but also more compute to run.

Weights

Weights are another word for parameters, the specific values the model learned. When people talk about "open weights" models, they mean the trained model's numbers are released publicly so anyone can run it themselves.

Context Window

The context window is how much text (in tokens) an LLM can consider at once, including both your input and its output. Claude has a 1 million token context window, which is hundreds of "pages." if you think about it as a giant text document. When you run past the context window, earlier parts of the conversation get truncated or dropped from what the model can see.

Training

Training is the process of feeding a model huge amounts of data and adjusting its parameters until it gets good at predicting patterns. Training a frontier model costs tens or hundreds of millions of dollars in compute.

Pretraining

Pretraining is the first and most expensive phase of training, where the model learns general language patterns from massive datasets scraped from the internet, books, and code. Everything you've heard about "trained on the whole internet" refers to pretraining.

Post-training

Post-training is everything that happens after pretraining to shape the model's behavior, like fine-tuning on curated data, teaching it to follow instructions, and making it safer. It's where a raw language predictor becomes something useful like ChatGPT.

Embedding

An embedding is a way of turning a piece of text, image, or audio into a list of numbers that represents its meaning. Texts with similar meanings end up with similar number lists, so a computer can find related content even when the exact words don't match. It's the trick behind search-by-meaning, recommendations, and most of how RAG works.

Vector

A vector is a list of numbers, and in AI, it's almost always an embedding. When people say they're "vectorizing" a document, they mean turning it into a list of numbers a computer can compare to other lists of numbers.

Vector Database

A vector database is a database optimized for storing and searching embeddings. It's the backbone of most RAG systems, because finding "the most similar" content at scale is a different problem from traditional database queries.

Attention

Attention is the trick inside a transformer that lets the model decide which earlier words matter most when predicting the next one. When you write a long prompt, attention is what helps the model connect "it" in the last sentence back to a noun mentioned three paragraphs earlier. It's the reason LLMs can keep track of long, complex inputs.

Inference

Inference is what happens when you actually use a trained model. You give it an input and it produces an output. Training is expensive but happens once, whereas inference runs every time someone uses the model.

Training Data

Training data is the collection of text, images, code, or other content a model learns from during training. The quality and diversity of training data directly shape what a model is good at, and most frontier model makers won't disclose exactly what theirs was trained on.

Temperature

Temperature is a setting that controls how random an LLM's output is. Low temperature means predictable, repetitive answers, and high temperature means more creative (and more unreliable) answers.

Mixture of Experts (MoE)

A Mixture of Experts model is split into many smaller models ("experts"), each trained to specialize in different patterns. When you send a prompt, the model routes each piece of it to just a couple of relevant experts instead of running the whole thing. The result: a model with the knowledge of a giant model that only uses a fraction of itself at any given moment, which makes it much faster and cheaper to run. DeepSeek, Mixtral, and Llama 4 all use this design. The opposite is a dense model.

Dense Model

A dense model is the traditional LLM design where every part of the model is used for every prompt. There's no routing or splitting like in a Mixture of Experts. Llama 3 and most older LLMs are dense. Simpler design, but slower and more expensive to run at the same total size.

KV Cache

The KV cache (short for key-value cache) is a chunk of memory an LLM keeps as it generates a response. In a conversation each new word it produces requires looking back at everything that came before it. Normally this would require reprocessing prior tokens, but the KV cache stores intermediate results so it doesn’t have to.

Tensor

A tensor is a grid of numbers, sometimes a flat list, sometimes a 2D table, sometimes a stack of tables, sometimes higher-dimensional. It's the basic data structure AI models work with: every input, every internal calculation, and every output is a tensor. When you hear "tensor cores" or "TPU," that's hardware built specifically to crunch tensor math very fast.

Getting More Out of Them

Getting More Out of Them

Prompt

A prompt is the input you give an AI to get a response. "Write me a poem about sourdough" is a prompt. The whole discipline of prompt engineering sprung up because how you phrase things matters a lot.

Prompt Engineering

Prompt engineering is the practice of crafting prompts to get better, more accurate, or more specific outputs from an LLM. It started as a buzzword-worthy skill and is now just part of working with AI.

System Prompt

A system prompt is a hidden instruction set that shapes how an AI behaves before any user message. It's the "you are a helpful assistant" layer sitting behind every chatbot, including rules, tone, and constraints.

Zero-shot Prompting

Zero-shot prompting is when you ask a model to do a task without giving it any examples. Modern LLMs are good enough that zero-shot usually works ok.

Few-shot Prompting

Few-shot prompting is when you include a few examples of the task inside your prompt before asking the model to do it. Useful for formatting or style tasks where showing beats telling.

Chain of Thought (CoT)

Chain of thought is a prompting pattern where the model is asked to reason step-by-step before answering. It dramatically improves performance on math and logic problems especially for smaller or earlier models, but is now baked into reasoning models by default.

Reasoning Model

A reasoning model is an LLM trained to "think" before answering, often involving intermediate reasoning steps (sometimes hidden) that improve accuracy on hard problems. They're slower and more expensive to run, but much better at math, coding, and logic.

RAG (Retrieval-Augmented Generation)

RAG is a technique that combines an LLM with a search step. When you ask a question, the system first retrieves relevant documents from a database, then feeds them to the model along with your question. This is how AI tools stay current or answer questions about your private data.

Chunking

Chunking is the process of breaking large documents into smaller pieces before feeding them into a RAG pipeline. How you chunk, whether by paragraph, by topic, or by token count has a huge impact on how well the system finds the right material later. Chunk too small and you lose context, chunk too big and the search gets fuzzy.

Retrieval

Retrieval is the "R" in RAG, the step where the system pulls relevant content from a database to feed the model. It can match by meaning (using embeddings), by exact keywords, or by a combination of both.

Reranker

A reranker is a second pass in a retrieval pipeline that takes a batch of candidate results and reorders them by actual relevance. Rerankers are usually small, purpose-built models and are a cheap way to improve RAG quality a lot.

Context Engineering

Context engineering is the evolving discipline of shaping everything an AI sees before it responds, including system prompt, retrieved documents, prior conversation, tool outputs, and more. It's the more serious descendant of "prompt engineering," reflecting the reality that modern agents consume a lot more than a single user prompt.

Fine-tuning

Fine-tuning is taking a pretrained model and training it further on a smaller, specialized dataset. It's how you get a model that really knows your company's voice, your legal domain, or your customer data.

RLHF (Reinforcement Learning from Human Feedback)

RLHF is a training technique where humans rate the model's outputs, and those ratings are used to nudge the model toward better behavior. It's a big reason modern LLMs are polite, helpful, and mostly refuse to help you do sketchy things.

Distillation

Distillation is the process of training a small model to mimic a big one. You get something faster and cheaper to run, with a lot of the original's capability.

Quantization

Quantization is a compression technique that shrinks a model by storing its weights as smaller, less precise numbers. Imagine the model originally remembered the value 0.7184635 and the quantized version rounds it to 0.72. The file shrinks dramatically, but the model still works almost as well. Quantized models are smaller and faster, and slightly less accurate (often not enough to notice). If you're browsing local model files you'll see labels like Q4_K_M or Q8_0, those are quantization levels. Lower numbers mean smaller and faster, higher numbers mean closer to the original quality.

Modern AI Tools and Interfaces

Modern AI Tools and Interfaces

Agent

An AI agent is a system built around an LLM that can take actions, such as browse the web, call APIs, run code, and update files, not just generate text. Agents loop: take an action, see the result, decide what to do next.

Agentic AI

Agentic AI is the shift from AI that only answers questions to AI that can plan, take actions and complete multi-step tasks on their own.

Tool Use (Function Calling)

Tool use, also called function calling, is when an LLM can invoke external tools such as a calculator, a search engine, or a code runner to produce better answers. It's a common foundation that AI agents are built on.

MCP (Model Context Protocol)

MCP is an open protocol that standardizes how AI agents connect to external tools and data sources. Think of it like USB for AI, where instead of every tool needing a custom integration, MCP gives them a common plug.

Multimodal

A multimodal model can handle more than one type of input or output, like text, images, audio, or video. ChatGPT, Gemini, and Claude are all multimodal.

Vision Model

A vision model is an AI that can interpret images. Most modern LLMs have vision baked in, which is why you can paste a screenshot into ChatGPT and have a conversation about it.

Computer Use

Computer use is a capability where an AI agent can control a computer the way a human would, including clicking, typing, scrolling, navigating apps. For a real-world example, see my post on Claude in Chrome.

Claude Code

Claude Code is Anthropic's command-line tool for working with Claude on software projects. It runs in your terminal, reads and edits files, executes commands, and can be extended with Skills and MCP servers. It's one of the main tools behind the current vibe coding wave.

Cursor

Cursor is an AI-first code editor, forked from VS Code and rebuilt around AI assistance. Alongside Claude Code and GitHub Copilot, it's one of the most popular tools for AI-assisted development.

Skills

In the Claude ecosystem, Skills are reusable capability packs that extend what the AI can do. Each one bundles instructions, tools, and example workflows for a specific job, like reviewing code, summarizing a meeting, or running a security check. You invoke a skill by typing /skill-name (similar to a keyboard shortcut) and Claude switches into that mode for the task.

Diffusion Model

A diffusion model is the type of AI behind many image and video generators. It starts with a canvas of random pixel noise, like TV static, and gradually transforms it, step by step, into a coherent image that matches your prompt. DALL-E and Stable Diffusion are diffusion models.

Text-to-Image

Text-to-image is any AI that generates images from a written prompt. "A corgi astronaut surfing on Mars" becomes a picture.

Text-to-Video

Text-to-video is the same idea but for video. Tools like Sora and Runway can generate short clips from text descriptions, though quality and length are still limited.

Running AI Locally

Running AI Locally

Local LLM

A local LLM is a model that runs entirely on your own computer instead of calling a cloud API. Slower and less capable than the frontier models, but private, offline, and free to run once set up. The gap is closing faster than most people realize, but hardware costs and technical barriers still make local LLMs out of reach for many users.

Hugging Face

Hugging Face is the GitHub of AI, the central hub where researchers and companies publish models, datasets, and demos. If you're downloading an open-weights model, you're almost certainly getting it from Hugging Face.

Ollama

Ollama is the easiest way for most people to run a local LLM. You install it, type ollama run llama3, and you have a chatbot running on your laptop. It handles downloads, quantization, and serving behind the scenes.

LM Studio

LM Studio is a desktop app for running local models with a graphical interface. Like Ollama, but click-based instead of command-line, which might be better for people who don't live in a terminal. Although recently Ollama introduced a GUI as well, so the preference comes down to feature sets.

llama.cpp

llama.cpp is the open-source engine that powers most local LLM tools, including Ollama and LM Studio. It's written in C++ and optimized to run large models on consumer hardware (CPUs, Apple Silicon, ordinary GPUs) instead of data-center chips.

GGUF

GGUF is the file format most local LLMs come in. It bundles the model's weights, metadata, and quantization into a single file that llama.cpp and friends can load. If you see a .gguf extension, that's a quantized model ready to run locally.

Safetensors

Safetensors is a file format for storing model weights that's safer and faster to load than the older .pt or .bin formats. Most models on Hugging Face are published as safetensors alongside or instead of GGUF.

VRAM

VRAM is the memory on your graphics card. It's the main bottleneck for running local LLMs. A model's size in GB needs to roughly fit in your VRAM for it to run fast. 8GB gets you small models, 24GB runs most quantized mid-sized models, 48GB+ starts approaching serious territory.

CUDA

CUDA is Nvidia's software platform for running general-purpose math on their GPUs. It's the reason Nvidia dominates AI, since almost every training framework, inference engine, and local tool is built on CUDA first and other hardware second.

NPU (Neural Processing Unit)

An NPU is a chip designed specifically for AI workloads, built into modern laptops and phones alongside the CPU and GPU. Apple Silicon's Neural Engine and the "Copilot+ PC" chips are NPUs. They make small models run fast and efficiently without a dedicated graphics card.

MLX

MLX is Apple's machine learning framework, designed specifically for Apple Silicon (M-series Macs). It takes advantage of unified memory, which lets Macs run larger models than you'd expect for their price. For example, a 128GB M-series Mac can load models that'd require a very expensive Nvidia rig.

Unsloth

Unsloth is a popular open-source library for fine-tuning LLMs much faster and with far less VRAM than the standard tools. If you're training a LoRA on a single GPU, there's a good chance you're using Unsloth under the hood.

LoRA (Low-Rank Adaptation)

LoRA is a fine-tuning shortcut. Instead of re-training a large model from scratch, which takes a data center, LoRA leaves the original model untouched and trains a small "patch" on top of it. The patch is a tiny file you can attach to or remove from the base model on demand. Popular for character or art-style add-ons in Stable Diffusion, and for cheap custom LLM tuning.

QLoRA

QLoRA is LoRA on a quantized base model. Same idea as LoRA, but even cheaper to train because the base is compressed. It's how hobbyists fine-tune models that would otherwise need a data center.

Inference Server

An inference server is software that hosts a model and answers requests. It's like a web server, but for AI. Ollama is an inference server. So are vLLM, Text Generation Inference (TGI), and the backend behind every cloud AI API.

Model Card

A model card is the documentation published alongside a model, describing what it was trained on, what it's good at, its limitations, known biases, and license. Hugging Face pages are essentially model cards.

Benchmark

A benchmark is a standardized test used to measure and compare model capabilities, such as MMLU for general knowledge, HumanEval for coding, GPQA for science, and many more. When you see a new model claiming "state of the art," it's being measured against benchmarks like these.

Eval

An eval (short for evaluation) is a test you run against a model to measure how well it performs on a specific task, either a standard benchmark or a custom one you wrote for your own use case. "Write evals before you ship" has become a core practice for building AI products.

Popular Model Families

Popular Model Families

GPT

GPT is the model family from OpenAI that powers ChatGPT.

Claude

Claude is Anthropic's model family, known for long context windows, strong reasoning, and a heavy focus on safety and alignment. Available through claude.ai, Claude Code, Claude Cowork, and the API.

Gemini

Gemini is Google DeepMind's flagship model, deeply integrated across Google's products such as Search, Workspace, and Android. It is multimodal from the ground up.

Google DeepMind

DeepMind is the research lab behind Gemini. Originally a UK company Google acquired in 2014, it merged with Google Brain in 2023 to form Google DeepMind. Most of Google's AI research now happens under this banner.

Llama

Llama is Meta's open-weights model family. When Meta released Llama weights publicly in 2023, it effectively kickstarted the modern open-source AI movement. Most local LLM tooling exists because Llama did.

Mistral

Mistral is a French AI lab that ships both open-weight and closed models. Known for small-but-capable models and a European counterweight to the US-China AI duopoly.

DeepSeek

DeepSeek is a Chinese lab that shook the AI world in early 2025 by releasing high-performing reasoning models with open weights at a fraction of the usual training cost. Heavy users of Mixture of Experts.

Qwen

Qwen is Alibaba's open-weight model family, particularly strong at multilingual tasks and coding. Qwen models are among the most downloaded on Hugging Face.

Gemma

Gemma is Google's open-weight family. The smaller, freely-distributable cousin of Gemini. Designed to run locally and be fine-tuned easily.

Kimi

Kimi is the model family from Moonshot AI, a Chinese lab known for pushing extremely long context windows. Popular in Asia and increasingly visible globally.

Grok

Grok is xAI's model, built into X. Known for a less filtered tone and tight integration with real-time X data.

Phi

Phi is Microsoft's family of small language models, designed to punch above their weight. Good choices for running locally on laptops.

Image, Video, and Audio Tools

Image, Video, and Audio Tools

Stable Diffusion

Stable Diffusion is the open-weights image model that democratized AI art when it was released in 2022. It's still the foundation of most local image generation. Every custom model, LoRA, and fine-tune in the scene descends from it.

Flux

Flux is an image model family from Black Forest Labs (founded by several of the original Stable Diffusion researchers). As of 2026 it's the leading open-weights image generator, especially for photorealism and text-in-image.

Midjourney

Midjourney is a closed image generation service, known for a distinct aesthetic quality that has made it a favorite among designers and marketers. Originally Discord-only, now has a proper web app.

DALL-E

DALL-E is OpenAI's image model. It lives inside ChatGPT, though the newer "GPT image" capability built into ChatGPT has largely replaced the standalone DALL-E experience.

Nano Banana

Nano Banana is the image generation model built into Google's Gemini. The name started as a leaked codename on LMArena, got adopted by users while testing it blind, and Google eventually embraced it as the public name. Known for strong character consistency and natural-language image editing.

ComfyUI

ComfyUI is a node-based visual interface for running local image and video generation models. Looks intimidating at first, but it's the power-user tool of choice. Entire workflows get shared on Reddit and Civitai as single JSON files.

Civitai

Civitai is the community hub for sharing Stable Diffusion models, LoRAs, and generated images. Think of it as the Hugging Face of the local image gen world, with a more open posture toward NSFW and fan content.

Sora

Sora is OpenAI's text-to-video model. First shown in early 2024, it set the bar for what AI video could look like and kicked off the current video generation boom.

Veo

Veo is Google's text-to-video model. Available through Gemini and increasingly integrated into Google's creative products.

Kling

Kling is a Chinese video generation model (from Kuaishou), currently considered one of the most capable in the world for text-to-video and image-to-video. Often beats US competitors on motion quality.

Runway

Runway is a creative AI company that predates the current boom. Originally a video editing tool, now a full video generation and editing suite. Their Gen-series models are industry favorites.

Higgsfield

Higgsfield is a video generation app focused on character motion and viral-style social media clips. Heavily popular on TikTok and Instagram for AI creators.

Suno

Suno is a text-to-music generator. Type a description, get a full song with vocals. Controversial among musicians, wildly popular with everyone else.

Udio

Udio is the other major text-to-music generator, often preferred for more controllable, higher-fidelity output. Suno and Udio are the two main players competing for the AI music crown.

ElevenLabs

ElevenLabs is the leading AI voice platform, including text-to-speech, voice cloning, dubbing. If you've heard an uncannily human AI voice in the last year, it was probably ElevenLabs.

ASR (Automatic Speech Recognition)

ASR is AI that converts spoken audio into text. When you dictate into your phone, generate captions on a YouTube video, or transcribe a meeting, that's ASR running under the hood.

TTS (Text-to-Speech)

TTS is the opposite of ASR: AI that turns written text into spoken audio. Modern TTS systems like ElevenLabs are good enough that a casual listener often can't tell it's synthetic.

Whisper

Whisper is OpenAI's open-source speech recognition model, widely considered the best free ASR available. It powers a huge chunk of the transcription and voice-to-text tools you see today, often running locally.

The Downsides

The Downsides

Hallucination

A hallucination is when an AI states something false with full confidence. It's a fundamental quirk of how LLMs work. They don't "know" facts, they predict plausible-sounding tokens. RAG, tool use, and better training all reduce hallucinations but don't eliminate them.

AI Slop

AI slop is low-quality, mass-produced AI content created to occupy space rather than inform. It's the equivalent of pink-slime journalism, flooding search results, YouTube thumbnails, Amazon books, and social feeds.

Prompt Injection

Prompt injection is an attack where malicious instructions are hidden inside content the AI reads like a webpage, a document, an email, tricking the model into doing something its user didn't ask for. It's one of the hardest unsolved problems in AI security.

Jailbreak

A jailbreak is a prompt that tricks an AI into ignoring its safety guidelines. Jailbreaks and safety training exist in a constant cat-and-mouse cycle.

Bias

Bias in AI is when a model's outputs reflect biases in its training data, such as cultural, racial, gender, political, whatever. Every model trained on internet data has some baked in.

Alignment

Alignment is the research area focused on making sure AI does what we want, including in edge cases and over the long term. "Aligned" means the model behaves in line with human values and intentions.

Guardrails

Guardrails are the safety rules and filters built into an AI system, refusing certain requests, filtering harmful output, and blocking sensitive topics. They're implemented through training, system prompts, and external checks.

The New SEO and AI Search World

The New SEO and AI Search World

SEO (Search Engine Optimization)

SEO is the traditional practice of optimizing content so it ranks in search engines like Google. Keywords, backlinks, headers, page speed, all the usual stuff. It's not going away, but it's no longer the whole picture.

GEO (Generative Engine Optimization)

GEO is the practice of optimizing content so AI tools like ChatGPT, Perplexity, and Google's AI Overviews cite and surface your site in their generated answers. It's a different game from SEO, where you're optimizing for a summary, not a ranked list.

AEO (Answer Engine Optimization)

AEO is the practice of structuring content so it can be extracted as a direct answer by an AI system. Clear questions, clean headers, self-contained passages that can be lifted out and cited without losing context.

AIO (AI Optimization)

AIO is a catch-all term for optimizing for AI-powered search surfaces, especially Google's AI Overviews. Overlaps heavily with GEO and AEO depending on who's using the term.

LLMO (LLM Optimization)

LLMO is optimizing how LLMs understand and recall your brand, both in their training data and when they retrieve your pages live. Similar to GEO with more focus on entity and brand recognition over time.

AI Overview

An AI Overview is the AI-generated summary Google shows at the top of search results. It pulls from multiple sources and usually cites a few of them. Getting cited there is the new "ranking on the first page."

Share of Voice (SOV)

Share of voice, in AI search, is how often your brand is mentioned in AI responses compared to competitors for a given set of prompts. It's the AI-era equivalent of ranking position.

AI Visibility

AI visibility is a measure of how often and how prominently your brand appears in AI-generated answers across tools like ChatGPT, Perplexity, and Google AI Overviews.

Semantic Density

Semantic density is how much meaning is packed into a sentence or passage. AI systems favor high-density content because it gives them more signal per token when they're generating summaries.

Citation

A citation, in the AI search world, is when an AI response names or links to your site as a source. Citations are the new backlinks.

Culture, Buzzwords, and Edges

Culture, Buzzwords, and Edges

Vibe Coding

Vibe coding is building software by describing what you want to an AI and letting it write most of the code. The term was coined by Andrej Karpathy in early 2025 and has become shorthand for AI-assisted development in general. I wrote a whole post on whether it's the future.

AI Native

AI native describes products, companies, or people that were built with AI at the center from day one, not bolted on after the fact. It's the 2020s version of "digital native."

Copilot

Copilot is the general term for an AI assistant embedded inside another tool, such as GitHub Copilot for code or Microsoft Copilot for Office.

Chatbot

A chatbot is any conversational software interface. It used to mean scripted decision trees, now it mostly means an LLM behind a chat window.

Open Weights

A model whose trained weights are publicly available. You can download and run it yourself, and sometimes fine-tune it, but that does not necessarily mean the model is open source.

Open Source

A model and its surrounding code are released under a permissive license that allows inspection, modification, and redistribution. In AI, this is rarer than “open weights,” and people often use the two terms interchangeably when they shouldn’t.

Closed Model

A closed model is one where the weights are private and you can only access it through a commercial interface or API, such as OpenAI's GPT, Anthropic's Claude, and Google's Gemini. Most frontier models today are closed.

AGI (Artificial General Intelligence)

AGI is hypothetical AI that matches or exceeds human intelligence across essentially all tasks. It's the long-standing goal of the field and the source of most sci-fi anxiety about AI. Depending on whom you ask, it's five years away or impossible.

ASI (Artificial Superintelligence)

ASI is the step past AGI, it's AI that's dramatically smarter than humans at everything. Right now it's mostly a theoretical discussion, but it's a real topic in alignment research.

Turing Test

The Turing test is a 1950 thought experiment where a machine is considered "intelligent" if a human can't reliably tell they're talking to a machine.

Keeping Up

Keeping Up

I'll update this as the vocabulary keeps growing. If there's a term you keep hearing that isn't here, or a definition you think I got wrong, let me know. A glossary is only useful if it keeps up with the language.

And remember, if you see people start throwing around new AI terms you've never heard of, chances are you're not the only one trying to keep up. Sometimes they're buzz words, sometimes they're marketing names that a company invented or a user coined, and sometimes they're genuinely technical terms that are traveling from niche research circles into mainstream conversation. Either way, it's always worth asking: "What does that actually mean?"