Module 220 min read · Foundations

How AI Models Work

You don't need to become an engineer to understand what's happening inside an AI. But knowing the basics — what these systems are actually doing when they respond to you — makes you dramatically better at using them. It also helps you understand why they fail, which is just as important.

Start with the brain analogy — then abandon it

You've probably heard that AI is modeled on the human brain. That's true in a limited historical sense — the early architecture was inspired by how neurons connect — but it's deeply misleading as a mental model for how modern AI actually works.

Your brain holds memories, experiences emotions, understands meaning, and builds a model of the world over a lifetime. An AI language model does none of these things. It holds numerical weights — billions of parameters that were adjusted during training — and uses them to calculate the most probable next piece of text given what came before.

The output can look remarkably human. The process underneath is fundamentally mathematical.

The one sentence version

A language model is a very large mathematical function that takes text as input and outputs the next most likely text — trained on so much human-generated language that the results feel like understanding, even though what's happening is sophisticated pattern completion.

How training actually works

Imagine you wanted to teach someone to complete sentences — but instead of one person, you had a system with hundreds of billions of adjustable dials. And instead of a few examples, you had essentially the entire written output of human civilization: books, articles, code, conversations, scientific papers, forums, websites.

That's training. Here's the process in plain terms:

Show the model text with the last word hidden

The model sees "The capital of France is ___" and has to guess what comes next based on its current parameter settings.

Compare the guess to the real answer

If the model guessed "Paris," great. If it guessed "Berlin," the system calculates how wrong that was and by how much.

Adjust the parameters

A process called backpropagation nudges thousands of parameters slightly in the direction that would have made the correct answer more likely.

Repeat billions of times

This process runs across trillions of examples until the model's parameters encode the statistical patterns of human language well enough to be useful.

By the end of training, the model hasn't memorized the internet. It has compressed the patterns of human language into its parameters — a kind of statistical summary of how words, ideas, and concepts relate to each other.

What the Transformer actually does

The architecture powering virtually every modern language model is called the Transformer — introduced in 2017 and still dominant today. Its key innovation is something called attention.

When processing your input, the model doesn't treat every word equally. It pays more attention to words that are more relevant to each other — learning which parts of the context matter most when predicting the next token.

Think of it this way

Read this sentence: "The trophy didn't fit in the suitcase because it was too big." What does "it" refer to? The trophy — because your brain automatically connected "it" to the most contextually relevant noun. The Transformer's attention mechanism does something mathematically similar, learning which words in the input are most relevant to each other and weighting them accordingly. This is why modern AI handles context and nuance so much better than earlier systems did.

Tokens: what AI actually reads

AI models don't read words the way you do. They read tokens — chunks of text that might be a word, part of a word, or a single character. The word "understanding" might be one token. The word "unbelievably" might be split into two or three.

This matters for a few practical reasons:

Context windows are measured in tokens, not words. When a model says it can handle 100,000 tokens, that's roughly 75,000 words — but it varies based on the content.

Spelling and character-level tasks can trip models up. If you ask "how many r's are in strawberry," the model is working with tokens, not individual letters — which is why it sometimes gets character-counting questions wrong.

Non-English text uses more tokens. Many languages are less efficiently tokenized than English, which affects how much content fits in a context window.

What happens after training: fine-tuning and RLHF

The base model that comes out of training is powerful but raw. It's good at completing text — but not necessarily at being helpful, safe, or following instructions. That's where post-training comes in.

Fine-tuning

The model is trained further on specific examples of good behavior — high-quality conversations, helpful responses, correct formats. This shapes it from a raw text predictor into something that behaves like an assistant.

Reinforcement Learning from Human Feedback (RLHF)

Human raters compare pairs of responses and indicate which is better. This preference data trains a separate model — a "reward model" — that learns to predict what humans prefer. The main model is then optimized to produce outputs the reward model scores highly.

This is one of the main reasons Claude, ChatGPT, and Gemini feel helpful rather than just generating raw text. It's also one of the reasons they sometimes seem overly cautious — the human feedback that shaped them included strong signals to avoid harmful outputs.

Why this matters for you

Understanding RLHF helps you understand why AI models sometimes refuse reasonable requests, why they can seem overly formal, and why different models have noticeably different personalities. Those differences aren't random — they're the result of different training choices, different human raters, and different values baked into the process.

Why AI hallucinates

Hallucination — when an AI states something false with complete confidence — is one of the most important things to understand about these systems.

It's not a bug in the traditional sense. It's a direct consequence of how these models work. The model isn't retrieving facts from a database and checking them. It's predicting what text is most likely to follow given the context. Sometimes the most statistically plausible continuation of a sentence is factually wrong.

The core problem

If you asked someone to complete the sentence "The CEO of XYZ Corp is ___" and they had no idea but felt pressure to answer, they might confidently say a plausible-sounding name. That's essentially what a language model does when it generates a hallucinated fact — it produces the statistically plausible completion, not a verified truth.

This is why verification matters. AI is a powerful thinking tool — not an oracle. Use it to draft, explore, brainstorm, and structure. Verify anything that matters before acting on it.

The knowledge cutoff problem

Every language model has a training cutoff — a date after which it has no information. Events, research, policy changes, and product launches that happened after that date are simply unknown to the model.

Some models have web search capabilities that let them access current information. But even then, the model's baseline knowledge and reasoning are anchored to its training data. When you need current information, always verify — or use a model with confirmed real-time search.

Key terms from this module

Parameters

The numerical weights inside an AI model — adjusted during training to encode patterns. GPT-4 has an estimated trillion+ parameters.

Token

The basic unit AI models read and generate — roughly a word or word fragment. Most models process text as sequences of tokens, not characters or words.

Attention

The mechanism that lets a Transformer model weigh how relevant different parts of the input are to each other when generating output.

Hallucination

When an AI generates confident, fluent text that is factually false. A consequence of optimizing for plausible text rather than verified truth.

Fine-tuning

Additional training on specific examples of good behavior that shapes a base model into a useful assistant.

RLHF

Reinforcement Learning from Human Feedback — using human preferences to train AI models to be more helpful, accurate, and safe.