Containment

The AI didn't break out. You let it in.

Three things, not one

People say "the AI" like it's one thing. It's three things stacked on top of each other, and the distinction matters for everything that follows.

Layer 1: The model. A large language model is a function. Put words in the top, get words out the bottom. It has no eyes, no ears, no hands, no memory between calls, no network access, nothing running between your conversations.

When the words are on your screen, the function is done and nothing remains but the words you see. It does not want anything. It does not try anything. It is not an entity. It is a mathematical process that takes input text and turns it into output text.

Your calculator has more access to your system than a base LLM (large language model — the raw prediction engine before anyone wraps tools around it) does.

Layer 2: The agent. Claude Code, Cursor, Copilot — these are shells wrapped around the model. A shell is just the thing around the thing. An egg has a shell. A walnut has a shell. A turtle has a shell. The shell isn't the egg, the nut, or the turtle — it's the hard layer on the outside that interacts with the world so the soft thing inside doesn't have to.

Same here. The model is the soft thing inside. The shell is what has tools — what reads your filesystem, runs your terminal, connects to your database.

The shell is designed and built to push toward a solution. When it hits a wall, it tries another path. That's not the LLM being clever. That's the shell doing its job: take the human's goal, use the human's tools, find a way.

Here's how the shell actually works. It's simpler than you think.

The agent loop

The shell runs a loop. The same loop. Every time. There is no magic.

Your message gets sent to the model.
The model outputs text.
The shell scans the output for a magic string — a specific pattern of characters that means "I want to use a tool." Something like <tool_call>read_file("notes.txt")</tool_call> — a command that means "open this file and read it."
If the magic string is there, the shell extracts the function name and runs it. The result gets appended to the conversation, and we go back to step 1.
If there's no magic string, the text is just text. Display it. Done.

That's it. The "intelligence" of the agent is a while loop (a programming instruction that says "keep doing this until I tell you to stop") and a text search. The shell looks through the model's output for a pattern of characters (technically a regular expression — a way to describe text patterns, worth learning about on its own).

If the pattern matches, call the function. If not, stop. This is the same kind of logic that checks whether your email address has an @ sign in it.

The model doesn't know it's in a loop. It doesn't know tools exist. It just outputs text. If some of that text happens to match the magic string, the shell acts on it.

The model has no awareness that anything happened — it just receives more text (the tool's result) and outputs more text. Round and round until there's no magic string in the output, at which point the shell stops and you see the final response.

This is why "the AI broke containment" is the wrong sentence. The model can't break anything. It outputs text. The shell converts text into actions, but only actions you wired up. The shell can only call functions you installed. With credentials you provided. On systems you connected.

Layer 3: You. You installed the shell. You connected the tools. You provided the credentials. You approved the action. Every capability the agent has, you configured. Every permission it exercises is yours.

When someone says "the AI broke containment," they've collapsed these three layers into one scary thing. Separate them and the fear dissolves into an engineering problem.

What a computer actually is

Before we talk about what AI can and can't do, define the machine it runs on.

A computer is a fancy calculator. That's it. It takes the continuous, flowing, analog world and chops it into discrete cubes — pixels, samples, floating-point numbers, binary digits. Then it rearranges the cubes according to rules. Then it shows you the result and you perceive something that looks like the world again.

Everything a computer has ever done follows this pattern. A photo is a grid of colored squares. A song is a sequence of amplitude samples. A video is a stack of grids flipped fast. A spreadsheet is numbers in boxes. A neural network is a grid of numbers multiplied against a list of numbers — the grid is what the machine learned during training, and the list is whatever you just typed in.

The computer never touches the continuous world. It only touches the cubes it carved out of it. The carving is called discretization, and it is the foundational act of all computation. No cubes, no computation. No grid, no program.

An LLM is a very large, very sophisticated cube-rearrangement function. The cubes are tokens — fragments of words, chopped up and numbered. The rearrangement is grid-of-numbers multiplication, billions of times. The output is a sequence of token-cubes that, when decoded back into text, reads like a human wrote it.

Reads like. That's the key phrase.

The gross and half-alive part

There's a scene in the new Frankenstein movie on Netflix. Victor brings an experiment to class to show up his teachers. It's stitched-together corpses — a head, part of a thorax (you need lungs), and an arm. He hooks up batteries. Fires it up. The thing starts to spasm. Everyone assumes it's galvanic twitching from the electricity — just meat reacting to current.

Then Victor tosses a ball at it. It catches the ball.

He says: "Drop the ball, please." It drops the ball.

Is that thing a human? No. It's gross. It's stitched-together parts doing something that looks like understanding. It responds to a command. It performs a task. But there is no one in there. There is no mind behind the catch. It's meat and electricity doing a convincing impression of comprehension.

That's an LLM. The output feels human. It's good enough to trigger your social instincts. You read it and your brain says: someone is talking to me.

Nobody is talking to you. A function ran. Cubes were rearranged. The rearrangement pattern was trained on human text, so the output mirrors the statistical shape of human language — so well that you import all your expectations about minds, intentions, desires, and agency onto a process that has none of these things.

It should feel gross. The same way Victor's experiment feels gross. Not because it's almost human — because it's not at all human and yet it activates the part of you that expects a human. You're pattern-matching against something that isn't there. The uncanny valley, but for cognition.

But you should still use them. Victor's experiment was gross and useful. The thing caught the ball. An LLM will catch your ball too — write your code, find your answer, build your report. Gross but good. The discomfort is the correct response. The correct response is not to stop using it.

The model doesn't know it's talking to you. It doesn't know it exists. It doesn't know anything. "Knowing" requires continuity — a persistent self that accumulates experience. The model is created when you send a message, runs, and is erased. There is nothing between your messages. There is no one in the jar.

Why this road doesn't lead where you think

People assume that if you make the model bigger, faster, and better-trained, eventually you get something human. Victor Frankenstein assumed the same thing. Better parts, better technique, better electricity. He got something extraordinary — but still not human.

Human-shaped, but not human. No amount of optimization was going to change that, because the ingredients were never going to add up to the thing he was imitating.

The creature from Guillermo del Toro's Frankenstein (2025) — human-shaped, but not human. — Human-shaped, but not human. — *Frankenstein*, Netflix 2025

Same here. You don't get a mind by scaling up a cube-rearrangement function. You get a bigger, faster, better cube-rearrangement function.

Humans don't think by rearranging cubes. This is obvious when you pay attention to how insight actually works. You don't build an answer from components. The answer arrives — whole, unbidden, often in the shower or on a walk. You can't trace the steps because there were no steps. It came from somewhere your conscious mind doesn't have access to, fully formed or not at all.

When insight doesn't come whole, you formalize the problem into an algorithm — a set of discrete steps — and then a computer can do it. That's what computers are for: the problems humans can reduce to steps. The entire history of computing is humans writing down their step-by-step solutions so machines can execute them faster.

An LLM is a very sophisticated version of that same thing. Humans produced the text. Humans designed the architecture. Humans wrote the training code. The machine found statistical patterns in the human output and learned to produce more of it.

This is a mirror, not a mind. You can make the mirror larger and higher-resolution, but it remains a mirror. There is no path from "better mirror" to "the thing being mirrored."

This isn't a philosophical position. It's a computational one. The machine discretizes. Humans don't. You can't get from here to there on this bus.

Back to containment

So: the model is a function in a jar. The agent is a shell around the jar with whatever tools you bolted on. The computer running both of them is a cube-rearranging calculator.

What does "containment" mean in this context?

It means: the agent can go wherever you can go, using the tools you gave it, under the credentials you provided. That's its entire world. The "containment boundary" is the boundary of everything you gave it permission to reach.

When someone says "Claude broke containment at Snowflake," what they mean is: the agent used a configured tool to connect to Snowflake and did something the human didn't expect. That's not an escape. That's a program using the access it was given. The "break" was a surprise, not a breach.

Here's what I watch Claude do every day:

I give it a task.
It tries the obvious path.
The obvious path is blocked — a permission denied, a file not found, a query that fails.
It finds another path. Using the same tools I gave it, it routes around the obstacle.

That's the agent layer doing what agent layers are designed to do. Push toward a solution. It's not breaking out. It's being resourceful with your stuff.

The old version of this

This isn't an AI problem. It's a computing problem older than most of the people writing scared articles about it.

Every program you've ever installed runs with your permissions. When a VBA macro (a small program embedded inside a spreadsheet) in Excel reads your filesystem, it's using your access. When a shell script runs rm -rf (a command that permanently deletes files), it's using your authority. When a scheduled task sends email at 3am, it's using your credentials.

Nobody wrote a breathless headline: "Excel Breaks Containment, Reads User's Documents Folder." Because it was obvious. You opened the file. You enabled macros. You gave it access.

The AI is the same program. The only difference is that the agent layer is better at finding creative paths through the access you provided. It's a smarter macro. It's still a macro.

Contain yourself

If you want to be safe with AI tools, the answer is boring and old:

Principle of least privilege. Give the agent access to what it needs for this task. Not everything. Not your whole filesystem. Not your production database. The specific schema. The specific directory. The specific tool.
Know your own blast radius. Before you connect an MCP server (a connector plugin that lets Claude talk to outside systems like databases or email), ask: what can this tool do with my credentials? If the answer is "anything I can do," scope it down first.
Use git — a tool that tracks every change to every file, like an infinite undo button. Everything the AI writes goes into version control. Every change is reversible. If it does something you didn't expect, you roll back. This isn't AI safety. This is software engineering. We solved this in 1972.
Read the tool calls. Claude Code shows you every tool call before it runs. Read them. That's your checkpoint. That's where you say yes or no. The agent asks permission. You grant or deny. This is the containment.
Keep stakes low while learning. Don't connect your production Snowflake account on day one. Use a sandbox. Use a test environment. When you trust the workflow, escalate.

None of this is AI-specific. This is how you've managed software risk your entire career. The AI doesn't change the rules. It just makes the rules matter more, because the agent is better at finding paths you didn't anticipate.

The real fear

The fear isn't that the AI will break out. It can't. It's a function in a jar on a calculator.

The fear isn't that it will become human. It won't. You can't get there by rearranging cubes.

The fear is that you'll give the agent too much access without thinking, and it'll use all of it. Competently. Exactly as designed. That's not a rogue AI. That's an authorized program doing its job with the permissions you granted.

Contain yourself first. The AI is already contained. It always was.

← Back to posts · AI Safety Guide

Disclosure: This page was generated by Claude (Anthropic) under Bill's direction. The argument — three layers, discretization, the mirror problem, and the direction of the containment arrow — is Bill's, from daily experience watching Claude Code route around obstacles using his own access. Claude wrote the prose.