The AI Primer -- I Get AI With a Little Help From My Friends

This section gives you the technical grounding to understand what is actually happening when you use these tools. It matters, because understanding the architecture helps you understand the limitations.

Before we get into the technical details, the single most important thing to understand is that AI in 2026 is not one thing. It is not ChatGPT. It is not the chatbot your company bolted onto its intranet. It is not the image generator your nephew used to make a funny picture of the dog. It is an ecosystem of dozens of tools built on different technologies, each good at different things. ChatGPT is one of the most well-known, but it is not always the best, and for many tasks it is not even the right category of tool. Understanding this landscape is what this guide is for.

This guide is organised around that ecosystem. The first three sections cover the tools you can pick up and use yourself, from general use through to more advanced builds. Start Here is the foundations: how to talk to these tools, what to pay for, and which to install first. You are reading the first of those pages now. What You Can Do covers the practical capabilities, from drafting emails and research through to language learning, image generation, and everyday-life jobs like home admin and helping older relatives. Build and Tinker takes you further, into AI agents, custom GPTs, and small apps you can build without being a developer.

The next section steps back from the consumer view. AI Beyond the Home covers what AI is doing at scale, in industries you encounter every day: healthcare, banking, retail, manufacturing, defence, research, government, and insurance. This is where the phrase "ecosystem of dozens of tools" stops being an abstraction and becomes the booking system at your GP, the fraud check on your credit card, the warehouse routing the parcel that turns up at your door, and the model your insurer uses to price your policy. You do not always interact with these tools directly. They are increasingly built into the services you already use.

And even that does not go far enough. There are layers underneath the consumer products and the industry deployments, the chip designers, the foundation-model labs, the cloud providers running tens of billions of dollars of compute, the safety researchers, the regulators trying to keep up, that this guide only gestures at. The Big Questions section is where I sit with the long-range stuff: where this is heading, who is sounding the alarm, and what it means for the rest of us.

How We Got Here, and Why It Keeps Changing

AI has been a research field for seventy years, and for most of that time it underdelivered. The last decade was the difference. Three things came together at once. The internet became big enough to train models on. The graphics chips designed for video games turned out to be exactly the hardware you need for the kind of mathematics neural networks do, and they got cheap. And in 2017 a small group at Google published a paper titled "Attention Is All You Need", which described the transformer wiring that the rest of this guide rests on.

The public moment was ChatGPT, in November 2022. It crossed 100 million users faster than any consumer product in history, and the global conversation about AI changed in about three months. The technology had been brewing for years inside research labs. What changed was that, for the first time, anyone with an internet connection could talk to it directly. Everything in this guide came after.

It will keep changing. The frontier labs (Anthropic, OpenAI, Google, plus a handful of others) release a meaningfully better model every six to twelve months. Each generation tends to be smarter on hard problems, better at handling long documents, and more capable of working with images and audio. The model names and prices in this guide have a shelf life measured in months. The principles last.

A Note on the Family Tree

You will hear "artificial intelligence", "machine learning", "deep learning", and "neural networks" used as if they are the same thing. They are not. They nest inside each other, like Russian dolls.

Artificial intelligence (AI) is the umbrella. The broad ambition of getting machines to do tasks that would normally need human cognition.
Machine learning is a subset of AI. The family of methods where a system learns patterns from data rather than being given explicit rules.
Deep learning is a subset of machine learning, using neural networks with many layers stacked on top of one another, hence "deep".
Transformers are one particular deep-learning wiring pattern, the breakthrough that made today's chatbots possible. LLMs are transformers.

So when you hear "generative AI" in the news, you are hearing about a thin sliver of one approach inside a much wider field. Today's headline technology is one room in a much larger house. Knowing where you are in that house helps with the rest of this guide.

The AI Family at a Glance

Before the deep-dive, the lay of the land. AI in 2026 is not one thing, it is a family of related technologies that solve different problems. The table below is the cast of characters. Each row is covered in more depth further down the page.

Type	What it does	Tools you have probably used
Large Language Models (LLMs)	Generate text by predicting one word after the next, trained on roughly the entire internet	Claude, ChatGPT, Gemini, Perplexity
Reasoning models	An LLM that thinks step by step before answering, useful for hard problems	OpenAI's o3 and o4-mini, Claude's extended thinking mode
Diffusion models	Turn a text description into an image or a video clip	Midjourney, GPT Image, Ideogram, Runway, Veo
Speech synthesis	Read text aloud in a chosen or cloned voice	ElevenLabs; voice mode in Claude and ChatGPT
Speech recognition (ASR)	Turn spoken audio into clean text	Otter, Fireflies, dictation on your phone, live captions
Music generation	Compose songs (with vocals) from a short description	Suno, Udio
Computer vision	Understand what is in a photo or video frame, in plain language	Visual Intelligence on iPhone, Google Lens, the camera button in Claude or ChatGPT
Agents	Take a sequence of actions on your behalf, not just answer	Claude Cowork, Manus, OpenClaw

The rest of this page walks through what each of these is doing under the hood, and what that means for you in practice.

Large Language Models (LLMs)

The tools at the centre of this guide, Claude, ChatGPT, Gemini, Perplexity, are all powered by Large Language Models. An LLM is a particular kind of computer program called a neural network, with a wiring pattern called a transformer, invented by Google researchers in 2017. A neural network is software loosely inspired by how brain cells fire together. The transformer was the breakthrough wiring pattern that, when fed enough text and enough computing power, learned language astonishingly well. The "enough text" is roughly the entire internet: books, websites, code, academic papers, Reddit threads, the lot.

Training happens in two phases, and the difference between them matters more than you might expect.

Phase one: predict the next word. The model is shown billions of sentences and asked, over and over, to guess what word comes next. It is the same trick your phone uses to suggest "well" after you type "I hope you are doing", scaled up about ten million times. This step is called pre-training, and it gives the model a statistical feel for language, facts, code, and reasoning patterns. It is not yet useful as a chatbot. It will happily finish any sentence you give it, including ones you would rather it did not.

Phase two: human feedback. Real humans rate the model's draft responses against each other, thousands of times over. The model adjusts itself to favour the responses humans preferred. This step is called reinforcement learning from human feedback, or RLHF, and it is what turns a raw text-completer into something polite, helpful, and willing to refuse when asked to do something dodgy. Phase one builds the brain. Phase two teaches it manners.

When you type a question into Claude or ChatGPT, the model generates its reply one token at a time. A token is roughly three-quarters of a word, or one syllable for shorter ones. The model picks the next token, then the next, then the next, like watching someone type. There is no database it is looking answers up in. There is no fact-checker in the loop. It is producing what sounds plausible given everything that came before, and the important thing to understand is that "plausible" is not the same as "true". This is why an LLM can write you a beautiful email and then, in the next breath, confidently invent a court case that never existed. Both come from the same machine doing the same thing.

Context window. This is the term you will hear most often. Think of it as how much the model can hold in its head at once: your prompt, any documents you have uploaded, and the conversation so far. Early models could juggle about 3,000 words, the length of a long letter. Current frontier models like Claude Opus 4.7 and Gemini 3.1 Pro can hold 750,000 words at once, more than the whole Lord of the Rings trilogy. That is what lets you drop a fifty-page contract into the chat and ask questions across the whole thing without it losing the thread.

Key models as of May 2026: Anthropic's Claude (Opus 4.7, Opus 4.6, Sonnet 4.6, Haiku 4.5), OpenAI's GPT-5.5 (the April 2026 release, including the GPT-5.5 Pro variant) and the GPT-5.4 and GPT-5.2 models still in rotation, Google's Gemini 3.1 Pro, 3.1 Flash, and 3.1 Flash-Lite, Meta's Llama (open source), Mistral (open source, French; the consumer product is called Le Chat), Alibaba's Qwen (open source), and DeepSeek (open source, Chinese, V4 with a one-million-token context window). The commercial models from Anthropic, OpenAI and Google are the most capable. The open-source models are catching up fast, and "open source" matters here because it means anyone can download the model file and run it on their own computer without asking permission. The Running AI Locally page covers how that works.

Model names verified May 2026. They change every six to twelve weeks. The principles on this page do not.

Reasoning Models

Reasoning models are LLMs that show their working. A standard model gives you the answer straight away. A reasoning model pauses, thinks out loud (often invisibly to you), considers different angles, and then gives you a more considered answer. The difference is the same as asking someone the answer to a hard maths problem versus asking them to walk you through the steps. They get better answers when they walk through the steps. So do these models.

OpenAI's o3 and o4-mini are the obvious examples. Claude has a similar capability called "extended thinking" mode. The trade-off is speed and cost. Reasoning models take longer and use more computing power, which means each question costs more. For everyday writing, summarising, or research, they are overkill. For genuinely hard problems (formal logic, multi-step code, complex strategy, anything mathematical), they are noticeably better.

Beyond LLMs: The Rest of the Family

Not everything in this guide is an LLM. The other categories listed in the table above use different techniques, and understanding the difference matters because it explains why some tools are great at one thing and useless at another.

Image and video generation

Diffusion models are the technique behind every image generator you have heard of: Midjourney, GPT Image, Ideogram, Adobe Firefly, plus the video tools Runway, Kling, and Google's Veo. The concept is genuinely strange. The model starts with a square of pure visual static, the kind of snow you used to see on an old TV between channels. Then it slowly de-noises that static, step by step, guided by your text description, until a coherent picture emerges. Imagine a sculptor working a block of marble, except the marble is random noise and the sculptor has watched millions of finished sculptures and developed a sixth sense for which chip to make next. That is roughly what is happening, mathematically. It is why these tools can produce photorealistic images and yet struggle with text inside those images: text is language, and the model is working with pixels.

Speech synthesis and voice cloning (the best-known example is ElevenLabs) use models trained on audio. They can read any text aloud in a chosen voice. Voice cloning is a step further. Give it thirty seconds of someone speaking and it can read anything in that person's voice, indefinitely. Like a very good impressionist with perfect recall on the first listen. The privacy and scam implications are obvious, and they are covered in the Scams and Deepfakes page.

Automatic speech recognition (ASR) is the technology behind every transcription tool you have used: Otter, Fireflies, the live captions on a video call, the dictation button on your phone. OpenAI's open-source Whisper model is the one most current tools are built on. ASR is the boring miracle of modern AI. It just works, and has done quietly for years.

Music generation (Suno, Udio) is its own category. These models have heard enough music in enough styles to imitate any of them on demand, vocals included. The lyrics they produce tend to come from a separate language-model component, because words and sounds are different problems for a computer to solve.

Computer vision is what lets you photograph a plant or a tax form, hand the image to Claude or ChatGPT, and get a sensible answer about it. Behind the scenes, the AI converts the image into the same kind of internal representation it uses for text. The photograph becomes, in effect, a very long sentence the model can think about. Current frontier models are multimodal, which is the technical word for "can see and hear and read all at once, in one combined system". The Camera section covers what this is actually useful for in practice.

Agents: The Next Architectural Shift

A chatbot answers. An agent does. That is the move from where we were in 2024 to where we are in 2026.

The difference is what happens after you ask. A chatbot says: "here is what you should book for your holiday." An agent opens the booking site, fills in the dates, picks a hotel that fits your brief, and waits at the payment screen for you to press Buy. The intelligence inside an agent is still an LLM, but the agent layer is what gives the LLM hands. It can browse the web, read and write files on your computer, send messages, call APIs, and look at what came back to decide what to do next. Tools like Claude Cowork, Manus, and OpenClaw are the consumer-visible end of this category, and the deep-dive lives on the AI Agents page.

It is still early days, and the right mental model is "an LLM with hands, working under supervision" rather than "a digital assistant who knows what it is doing". Agents are powerful and unreliable in roughly equal measure. They make things up. They get stuck in loops. They consume resources unpredictably. Occasionally they take actions you did not intend. Treated as a junior employee who needs checking, they are useful. Treated as autonomous, they are a liability.

What AI Is Not

AI in its current form is not sentient, conscious, or genuinely "intelligent" in the way humans are. It does not understand what it is saying. It does not have beliefs, intentions, or experiences. It is very sophisticated pattern matching on an enormous scale. This is not a limitation that will be fixed in the next version. It is a fundamental characteristic of the architecture.

This matters practically: AI will confidently present false information because it is optimising for plausible text, not truth. It will agree with you when it should push back, because agreement is a common pattern in its training data. It will produce impressive-sounding but vacuous output if you give it vague instructions, because vague inputs produce generic patterns.

The people who get the most value from AI are the ones who understand this. They verify facts. They give precise instructions. They treat AI as a powerful tool with known failure modes, not as an oracle.

Going Deeper

This page is a primer, and by design a short one. The Bibliography is where the longer, sharper, more critical thinking lives. It lists the books, papers, and writers I have found most useful for understanding AI, including a few I disagree with. If you only read one of them, make it Melanie Mitchell's Artificial Intelligence: A Guide for Thinking Humans, the most readable explainer of how this technology actually works. Brian Christian's The Alignment Problem is the second one I would press on you, for the bigger question of what happens when these systems become more capable than the people building them.

A Note on Being Worried About All This

The section above is about what AI is technically. This bit is about what it is for the rest of us.

I will be honest. I find AI worrying. Not the robot-uprising kind of worrying, the slower kind. I worry about the future of work and what happens to the millions of jobs that look very automatable on a five-year view. I worry about Artificial General Intelligence and the small group of people who think they will be the ones to decide how it is used. I worry about AI in warfare, which is no longer hypothetical and is genuinely terrifying. The long version of all three arguments lives later in the guide.

I also use AI every day, and it saves me hours. It lets me do things I could not do before. The honest position, the one I have settled on, is that both are true at once. The technology is genuinely useful, and the future it implies is genuinely concerning. Sticking your head in the sand is not the answer. Neither is panic.

If you want to jump straight to the worries, The Big Questions section of the guide is where they live, and the three pages linked above are good places to start. But I would suggest, before you go there, spending a bit of time on the consumer side of the guide first. Understand how AI is actually showing up in your life right now, often quietly making things flow more smoothly without you noticing. Once you have a feel for what it can do, you will be better equipped to think about what it should not. (Don't worry, a proper discussion of robodebt shows up later. The dark stories are not skipped.)

If I had to summarise the position I have arrived at, it is something like this. Understand what you can. Use it to the best of your ability. And if you so choose, understand your "enemy". That last one is optional. The world will continue regardless. But you might sleep a bit better knowing what is actually being built, and by whom, rather than working from the headlines.

What AI Cannot Do