Plain Text

Every file is the same thing. The extension is a suggestion.

The thing nobody tells you

Every file on your computer — every document, spreadsheet, photo, song, executable, database — is the same thing at the bottom: a sequence of bits. Ones and zeros. Nothing else. A Word document and a JPEG and an MP3 and a Python script are all made of the same stuff. The bits don't know what they are. They don't care.

The computer reads those bits in chunks called bytes — groups of 8 bits at a time. One byte can represent 256 different values. That's enough for every letter, digit, and punctuation mark in English, with room left over. A file is a sequence of bytes, consumed one at a time, from beginning to end. That's it. That's all a file is.

So if every file is the same raw material, how does your computer know that one file is a photo and another is a spreadsheet?

The extension is a label, not a lock

The three or four letters after the dot — .txt, .docx, .jpg, .pdf, .py — are a label. The label does two jobs:

It tells the operating system which program to open. Double-click a .docx and Word opens. Double-click a .jpg and Preview opens. Double-click a .py and … it depends on how your system is configured. The extension is a suggestion to the OS: "try this program."
It hints at what's inside. A .csv says "this file contains rows and columns separated by commas." A .html says "this file contains web page markup." A .md says "this file contains formatted text using a lightweight system called Markdown."

Both of these meanings are many-to-many. One extension can be opened by multiple programs — a .csv opens in Excel, Google Sheets, Numbers, any text editor, or a Python script. And one program can open multiple extensions — Word opens .docx, .doc, .rtf, .txt, and more.

The extension is not the file. It's a label someone stuck on the outside.

You can prove this yourself. Take any .csv file, rename it to .xyz, and open it with a text editor. Same bytes. Same commas. Same data. The only thing that changed is which program your operating system tries to use when you double-click it.

Two kinds of files

If all files are bytes, why can you open some in a text editor and not others? Because there are really only two categories:

Plain text — every byte maps directly to a character you can read. The letter A is byte 65. The letter B is byte 66. A newline is byte 10. Open the file in any text editor and you see exactly what's there. Nothing is hidden. Nothing is encoded. What you see is what the file is. Examples: .txt, .md, .csv, .html, .py, .json, .sql.

Binary — the bytes encode something that isn't human-readable text. They might represent pixel colors, audio waveforms, compressed data, or a proprietary format that only one program understands. Open a binary file in a text editor and you get gibberish — the bytes are there, but they weren't meant to be read as letters. Examples: .docx, .xlsx, .pdf, .jpg, .mp3, .exe.

Here's the thing that surprises people: some extensions that sound fancy are actually plain text. An .html file — the thing web pages are made of — is plain text. A .json file — the format APIs use to send data — is plain text. A .py file — a Python program — is plain text. A .sql file — a database query — is plain text. You can open any of them in Notepad and read every character.

And some extensions that sound simple are actually binary. A .docx is not a text file — it's a zip archive containing XML files, style definitions, and embedded media. That's why you need Word (or something that understands the format) to open it. The bytes are there, but they're structured for a machine, not for your eyes.

Why this matters for working with AI

An AI reads files the same way you do in a text editor — byte by byte, character by character. When you hand Claude a plain text file, it reads every character directly. Nothing is lost. Nothing needs translation. The file says exactly what it is.

When you hand it a binary file, something has to translate first. The .docx has to be unpacked. The .pdf has to be parsed. The .xlsx has to be decoded. Every translation step is a place where meaning can be lost, formatting can break, or data can be misread.

The practical rule: if you want an AI to read it, write it, edit it, or build on it — use plain text. Markdown (.md) for documents. CSV (.csv) for data. HTML (.html) for pages. JSON (.json) for configuration. These aren't primitive formats. They're the most durable, portable, and AI-compatible formats that exist.

Every workflow on this site — the meeting decomposition, the disposable software, the book-writing kits, this website itself — is built on plain text files. Not because we're being retro. Because plain text is the format that needs no translator, no vendor, no license, and no expiration date.

The format is not the document

People confuse the container with the contents. "I need to send a Word document" usually means "I need to send some text with formatting." The text is the document. Word is one possible container. So is a .pdf. So is an .html file. So is a .md file that converts to any of the above in one command.

When someone asks for a .docx, what they're really asking for is something they can open, read, and maybe edit. A plain text file does all three — in more programs, on more devices, for more years, than any proprietary format ever will.

The page you're reading right now is an .html file. It's plain text. Right-click, "View Page Source," and you can read every byte. There's no build system, no CMS, no database behind it. It's a text file served by a web server. That's all a website has to be.

The file format cheat sheet

Everything above the gold line is plain text — readable by any editor, any AI, any program, on any operating system, forever. Everything below requires a specific program to make sense of the bytes.

The durability argument

A plain text file written in 1970 is still readable today. Open it in any editor on any computer and every character is there. No conversion. No compatibility mode. No "this file was created in an older version of —"

A WordPerfect file from 1992 is a museum piece. A Lotus 1-2-3 spreadsheet from 1988 requires an emulator. A PageMaker document from 1995 is functionally dead. The programs that created them are gone. The bytes are still there, but nobody speaks the language anymore.

Plain text doesn't have this problem. The encoding is universal. The format is the format. A .txt file outlives every program that ever opened it.

So why does anyone use binary formats?

Because they do things plain text can't — or can't do easily. A .jpg compresses a million pixels into a manageable size. A .docx embeds fonts, images, and layout instructions. A .pdf guarantees that the document looks the same on every screen and every printer. These are real capabilities.

The mistake is using binary formats when you don't need those capabilities. Writing a status report in Word when a markdown file would do. Sending data in an Excel spreadsheet when a CSV would do. Building a dashboard in PowerPoint when an HTML page would do.

Use binary formats for what they're good at — rich media, precise layout, final output. Use plain text for everything else. That's most things.

← Back to posts

Disclosure: This page was generated by Claude (Anthropic) under Bill's direction. Bill provided the core insight — all files are bits, extensions are dual pointers (program and content), both sides are many-to-many. Claude wrote the prose and built the diagrams.