I Tried to Widen a Box. I Ended Up 20x Faster Than Pandoc.

A couple of days ago I was looking at mkdown — a tiny markdown-to-HTML converter I built in Go, the kind of toy nobody asked for — and one thing bugged me: the rendered page felt a bit narrow. The text column was clamped at max-width: 800px, and on a wide monitor it looked like a receipt.

So I did the responsible thing. I changed one number to 980px, committed it with the world’s most boring message, and rebuilt the binary. Two minutes. Done. A normal person closes the laptop here and goes outside.

I am not, it turns out, a normal person.

The One-Line Fix That Opened a Trapdoor

Because while I had the thing open, I figured I’d throw a real document at it. Not the dinky README — something chunky. So I generated a gloriously stupid 3.2MB markdown file stuffed with every feature at once: tables, footnotes, math, mermaid diagrams, a thousand code blocks. The markdown equivalent of a kitchen sink with another kitchen sink bolted to it.

Then I converted it.

1.6 seconds. For one file.

My eye did the thing. You know the thing — the involuntary twitch a programmer gets when a number is wrong in a way they can feel but not yet explain. 1.6 seconds to turn text into other text? On an M-series chip that can do billions of things between heartbeats? Something in there was being profoundly lazy, and it wasn’t me for once.

And just like that, the trapdoor opened and I fell through it. Goodbye, afternoon.

Chroma’s Secret 364-Way Filename Beauty Pageant

I did what you’re supposed to do and what almost nobody actually does first: I shut up and profiled it. Wired up pprof, ran the conversion ten times, and stared at the flame graph like it owed me money.

The number one CPU hog was not in my code. It wasn’t even in the markdown parser. It was a function called path/filepath.Match.

Let me say that again, because it’s genuinely absurd: 34% of the time spent converting a markdown file was spent matching filenames. There are no filenames in markdown. There are no files. I was already holding the text.

Here’s what was happening, and it’s beautiful. My test file had 364 mermaid diagram blocks (```mermaid). The syntax highlighter, Chroma, doesn’t have a lexer for “mermaid” — it’s a diagram language, not code. But instead of shrugging and moving on, Chroma’s lexers.Get("mermaid") does this:

Checks its name table. Nothing.
Checks its alias table. Nothing.
Then treats "mermaid" as if it might be a filename, and globs it against the filename pattern of every single one of its ~250 registered lexers — *.go, *.py, *.rb, the whole parade.
Finds nothing, because of course it doesn’t.
Runs a regex content-analysis pass for good measure.
Gives up.
Emits a plain code block. Which is exactly what it would have done in the first place.

And it did this 364 times, with zero caching of the “nope, never heard of it” result. Every mermaid block in the file kicked off a full filename beauty pageant across 250 contestants, judged nobody a winner, and produced the identical boring output it could’ve produced for free. It was like watching someone re-derive that water is wet, once per paragraph, forever.

The fix was a little AST transformer that catches code fences Chroma can’t recognize, hands them straight to a plain renderer, and — crucially — remembers the answer so each unknown language pays the lookup exactly once instead of once per block. Output came out byte-for-byte identical. The clock went from 1.6 seconds to 0.48.

One bug down. I should have stopped. I did not stop.

goldmark’s Thousand Tiny Acts of Confidence

I re-profiled — because the first rule of optimization is that the bottleneck always moves, like a guilty man changing seats — and the new villain was fmt.Sprintf, eating a third of all memory allocations. Specifically, fmt.Sprintf called from goldmark’s heading-ID generator.

You know how a good blog renderer gives every heading an id so you can link to it? When two headings have the same text, it disambiguates: intro, intro-1, intro-2. Reasonable. Civilized.

Now remember that my cursed test file had 364 identical headings. Here’s how goldmark resolves the 364th ## Overview:

Try overview-1. Taken? Try overview-2. Taken? Try overview-3…

Every probe a fresh fmt.Sprintf. Every collision starting the count from one, every time. That’s an O(n²) march straight into the sea — roughly sixty-six thousand throwaway strings per heading group, all so a handful of <h2>s could get a number on the end. goldmark wasn’t wrong, exactly. It was just confidently, exhaustively, allocation-meltingly thorough about a problem that has a one-line shortcut.

The shortcut: remember the next number to try for each slug, instead of re-counting from -1 like a goldfish. Since IDs never get freed, every lower number is already taken — so starting where you left off lands on the same answer the slow way would, minus the sixty-six thousand corpses. I dropped in a custom parser.WithIDs, kept the output byte-identical (this matters — “faster but different” is just a new bug wearing a track suit), and the clock fell to 0.26 seconds.

That’s 6.2x faster than where I started, and I had transcended the original cosmetic complaint so completely I could no longer see it with the naked eye.

One Process to Rule Them All

Here’s where I remembered what the tool is actually for. Nobody converts one 3.2MB monster file. People have folders. Hundreds of little markdown files — docs, notes, a blog’s worth of posts.

So I taught mkdown to take a glob — mkdown *.md — and convert the whole pile across a worker pool sized to your CPUs. One file became a thousand. And the speedup wasn’t just “more cores go brr.” It was structural, and it’s the part I’m smuggest about:

Every other command-line markdown tool, when you point it at a folder, runs inside a shell loop — for f in *.md; do .... Which means for a thousand files it pays for a thousand separate process startups. A thousand times the operating system has to fork, exec, load, initialize, and tear down. That overhead is invisible on one file and catastrophic on a thousand.

mkdown is one process that chews through the whole list. No per-file startup tax. On my machine it converts 1,000 feature-rich files in 404 milliseconds — or 107ms if you pass --no-highlight and skip the fancy code coloring. (Yes, I added a flag to turn off the very feature I’d just spent all evening speeding up. Optimization is a rich tapestry of self-betrayal.)

One subtle landmine I stepped on so you don’t have to: I’d originally sized the worker pool with runtime.NumCPU(), which cheerfully reports the host’s core count — even inside a container that’s been throttled to 2 CPUs on a 64-core box. So in CI it would’ve spawned 64 goroutines to fistfight over 2 cores. Swapped it for GOMAXPROCS, which actually respects the cage you’ve been put in. Container-aware humility.

The Part Where I Pick a Fight With the Giants

At this point I had a genuinely fast little tool and the emotional regulation of a toddler, so naturally I challenged the entire field to a duel. I installed every markdown converter I could find — pandoc, cmark-gfm, comrak, markdown-it, python-markdown — and ran honest, warmed-up hyperfine benchmarks doing the same job.

I’m going to be straight with you, because pretending otherwise is how you get torn apart on Hacker News by someone with a username like xX_segfault_Xx:

mkdown is not the fastest markdown parser on Earth. cmark-gfm, written in C, parses raw text roughly 3x faster than I do. comrak, in Rust, is right there too. If the game is “turn markdown into a bare HTML fragment as fast as physically possible,” the C and Rust elders win, and they should — that’s the whole job they were born for.

But here’s the thing. That’s not the job.

cmark hands you a naked HTML fragment — no document, no styling, no syntax highlighting, no <head>. It’s an ingredient, not a meal. The moment you ask for what people actually want — a finished, styled, syntax-highlighted, standalone HTML page — the only other tool that does it out of the box is pandoc. And pandoc is a magnificent, sprawling, do-everything Haskell battleship.

On the full job, my dumb little Go binary converts that 3.2MB file in 269ms. Pandoc takes 5.56 seconds.

That’s 20x. Twenty.

And on the batch — a folder of real files, everybody allowed to parallelize however they like — mkdown finishes in 36ms while cmark-gfm-in-a-shell-loop takes 239ms (and it’s still only emitting fragments), and pandoc takes 1.8 seconds. The one-process trick laps a faster parser by 6.7x, just by not paying the startup tax a thousand times.

So no, I didn’t build the fastest parser. I built the fastest way to turn a pile of markdown into a pile of finished web pages, which is the thing I actually wanted, and which apparently nobody had bothered to make snappy. The elders optimized the engine. I optimized the commute.

Shipping the Beautiful Mess

A tool nobody can install is just a diary entry. So I went the rest of the way:

GitHub Releases with prebuilt binaries for macOS, Linux, and Windows — built and fanned out by goreleaser, so a git tag does the whole thing.
A Homebrew tap, so it’s a brew install away.
And the one I’m irrationally pleased with: npx @mkdown/cli, packaged the way esbuild ships its Go binary — a tiny launcher with per-platform packages gated by os/cpu, so npm grabs only the binary you need and you never touch a Go toolchain. A markdown tool’s natural habitat is full of web developers who live in npm, so I went and knocked on their door.

Then, because I have a problem, I recorded a 15-second terminal demo with VHS showing a thousand files convert in under half a second, and dropped the GIF at the top of the README. The README that started this whole thing because its container was, you’ll recall, a bit narrow.

I never did go back and admire the wider box.

The Box Was Never the Problem

Here’s the confession, and it’s the same one every time: none of this needed to happen. Nobody filed a ticket. There is no roadmap, no user, no revenue, no standup where I report that the heading-ID generator now allocates 31% less. The original “bug” was a div that was 180 pixels too skinny, and I’d already fixed it in the first two minutes.

But that twitchy 1.6-second number was a door, and I have never once in my life walked past an open door labeled “something in here is being lazy and you could find out why.” The widening of the box was the excuse. The rabbit hole was the whole point. By the time I looked up, I understood Chroma’s lexer-resolution internals, had a custom AST transformer, a parallel batch engine, a benchmark suite with receipts, and four ways to install the thing — all in service of a complaint I’d resolved before I made any of it.

And honestly? My serious code is better for it. The instinct that catches filepath.Match eating a third of your runtime, or an O(n²) hiding behind an innocent fmt.Sprintf, doesn’t come from sober, professional, well-scoped work. It comes from following the twitch down the hole at 11pm on a Tuesday because a number offended you.

(Full disclosure, very much in the house style: I did all of this — the profiling, the fixes, the benchmarks, the packaging, and this post — alongside Claude Code, then went through and roughed it up by hand so it sounds like me and not like a press release. The agent is a fantastic spelunking partner. It still can’t tell you which holes are worth falling into. That part’s still on you.)

So: go find your too-narrow box. Change the one number. And then, when the trapdoor opens — and it will — do yourself a favor and fall in.

The afternoon was never going to be that productive anyway.

mkdown is on GitHub if you want to convert a pile of markdown faster than is strictly reasonable. It is, by every sensible measure, completely unnecessary. That’s the best kind.