I Built a 10k-Player Game Server in Elixir. I Don’t Speak Elixir.

A few weeks back I started a real-time multiplayer game server. It’s an Elixir umbrella with nine apps. It handles 10,000 concurrent WebSocket connections on one node, p99 place-bet round-trip under 3 milliseconds. It’s multi-tenant, multi-node, cluster-aware. 364 tests + 1 property plus a small Python pytest suite for the reference operator. Eleven feature tiers shipped, all closed, all green.

I don’t speak Elixir.

I mean, I’ve picked up enough to recognize the shape: pattern matching, the pipe operator, GenServers, the actor model, OTP supervision trees. I can follow a stack trace if I squint. I can read a module top to bottom and roughly tell you what it does. But if you sat me down and told me “add a feature” or “fix this bug” with no agent open, I’d be stuck in five minutes.

This is fine. This post is about why.

What got built

The system hosts game rounds in real time, fans state to thousands of concurrent players over WebSockets, and integrates with operator wallets through a pluggable adapter. No real money. This is the substrate, not the casino.

In the box:

Nine umbrella apps, layered from data up through the edge. Each app implements the same five-layer pattern (schemas → DTOs → repositories → services → workers). Layer boundaries enforced by the compiler.
One game shipped end-to-end (DICE), three more specced but parked behind a partner ask.
A load-test rig that proves 10k concurrent WS on one BEAM. The first time I ran it at 10k, it tripped its own rate limiter because the loadtest was the attacker.
A Phoenix LiveView admin dashboard: round inspector, wallet ops, bot fleet UI, versioned game-config editor.
An HTTP-seamless operator wallet protocol with an Elixir adapter, plus a Python FastAPI reference operator that any partner can study before writing their own.
Cluster-wide round affinity via a small custom registry. A bet arriving on the wrong node transparently routes to the owner. (This one closed an open question in the spec; more on that later.)
Sentry + Axiom for errors and structured JSON logs, both env-var gated so dev stays quiet without any config branching.
A docs directory with system overview, onboarding, language primer, runbook, glossary, API reference, deployment guide.

That’s the shape. I won’t pretend it’s done — plenty of gap work, especially production deploy plumbing. But what’s there works.

What I actually do

People assume “AI-assisted coding” means typing prompts and clicking accept. That’s not what this is.

I’ve been a developer for about twenty years. That fact is doing more work in this project than anything else, and I want to put it on the table before anything else I say.

What I do, mostly, is make decisions. Not implementation decisions — those happen at a keyboard layer I’m not at. Architecture decisions:

Kubernetes or bare VPS? Bare VPS, because I can’t manage K8s under pressure.
Unfreeze the deferred games or push deeper into infra? Infra. Partners want scale, not catalog breadth.
Ship the versioned game-config editor or a read-only viewer? Viewer. The hot-reload semantics question is bigger than one tier.
Build a Dockerfile or skip it for Ansible-managed releases? Ansible.

These decisions are mine. I’ve never typed a defmodule for this codebase. I’ve read maybe 20% of the source. But I know the shape of the system, the tier roadmap, every locked decision in the spec’s decisions log. I know what’s wired, what’s a gap, what’s deferred and why.

The other thing I do is veto. The agent suggests something. I read enough of the suggestion to understand the trade-off. I say “no, that’s overkill” or “yes, but make X explicit” or “do A first, then B”. The vetoes shape the codebase as much as the agreed plans do.

What I don’t do: I don’t review every line. I don’t audit every test. I don’t read the diffs that don’t matter. I trust the test suite (364 + 1 property) and the tier discipline (each tier closes a coherent slice, verifiable from the outside) to catch what I miss.

The doc set is insurance

There’s a real risk to this way of working: if I get hit by a bus tomorrow, who reads this codebase? Not me. Not the agent in a fresh session — the context is gone. The next human inheritor would be staring at nine umbrella apps in a language they probably don’t know.

So the docs aren’t an afterthought. They’re the only thing that keeps the project recoverable.

Every doc was written with the question “what does a non-Elixir reader need to understand this?” as the constraint. The onboarding has a code tour with the seven files that get you 80% of the system. The runbook has forty break-glass recipes — what to do when something is broken, written from actual debugging incidents we hit along the way, not speculative. The glossary defines every domain term with cross-references and an “easy to confuse” opener pairing the words newcomers always swap (Game vs Round, Coordinator vs Scheduler, Reserve vs Commit).

Writing them took roughly as long as writing the code they describe. But the cost of not having them — being one bus accident from an unmaintainable system — is higher than the cost of building them.

There’s a rule I keep, now codified in the agent’s long-term memory so it survives session resets: when shipping a user-visible code change, update the relevant docs in the same session. Not “after”. Not “in a docs sprint later”. In the same commit, before the work is considered done. That’s the only way docs don’t drift.

The twenty years is what’s working

I want to be precise about what’s doing the work here. It’s not the agent. The agent is fast and capable — that’s table stakes now. What’s doing the work is twenty years of writing software, evaluating every line of suggestion that scrolls by.

Here is what twenty years gives you that no LLM can give you, and that no new developer has yet:

You know the edge cases. You know what’s likely to break first. You can read a function signature and feel the wrongness before you read the body. You’ve seen Sentry crash on boot because someone passed nil where it expected a string. You’ve seen OTP supervisor shutdowns silently skip terminate/2 because nobody flipped trap_exit and the cleanup never ran. You’ve seen migrations forget primary_key: false and watched a binary-UUID column collide with the default bigserial for half a day before someone caught the type error in the logs. You’ve seen rate limiters trip themselves at 2 a.m. because the load test was the attacker.

Every one of those is a real incident from this project, by the way. Every one of them, the agent confidently proposed something that was mostly right but contained one of those traps. Without scar tissue, you accept the trap. You ship it. It works in test. It works in dev. Six weeks later it breaks production at four in the morning and you have no idea why.

I’ve made every one of those mistakes myself, years before any agent existed. The mistakes are what trained the part of my brain that catches them now — review-by-instinct over the agent’s output. The agent’s first answer to a question I haven’t asked is often wrong in a specific, subtle way that I recognize because I shipped that same wrong answer in 2011 and lived through its consequences.

Without that scar tissue, you will fail eventually.

Not on day one. Day one is the agent’s strongest day. Probably not on day thirty either. But somewhere between month three and month nine, a real production incident lands, and you can’t tell whether the agent’s suggested fix is correct because you don’t have the context to evaluate it. You ship the fix. The fix introduces a worse bug. You ship that fix. The codebase rots from the inside, accruing technical debt you can’t even see because you don’t know what good would have looked like.

The agent is not a substitute for development experience. It is a force multiplier on it. Multiplying by zero gives zero. Multiplying by a junior developer gives a junior developer’s output produced faster — which means more decisions to make, more chances for a confidently-wrong answer to slip through, more bugs accumulating in domains the developer doesn’t have taste in yet. The speedup is real and the cliff is just as real.

If you’re newer than that and you want to build something like this: find a senior. Pair with them. Have them review the agent’s output until your own instincts catch up. Don’t go solo. The cost of being wrong is paid by the people running your production system at 4 a.m., not by you.

This is the unfashionable take in 2026 and I’m going to say it anyway: agentic coding rewards experience violently. The gap between a 20-year senior and a 2-year junior widens when you put an agent in front of both of them, because the senior has the judgment to filter the agent’s output and the junior doesn’t. We’re not democratizing software engineering. We’re concentrating the rewards in people who already knew what they were doing.

The honest limits

I can’t deploy this to production by myself. Right now there is no release config, no container image, no reverse-proxy configuration, no Ansible playbook. The deployment doc describes the path but the path isn’t paved. When we’re ready to ship to a real partner, I’ll need to do that work — and “do that work” means another series of decisions, vetoes, and verifications. Not a click-deploy moment.

I can’t debug a non-trivial production incident without help. The runbook covers a lot — forty recipes covers a lot of “thing X is broken” patterns. But if something subtle breaks — a memory leak, a Postgres lock-contention spike, a wallet adapter race — I’d need the agent open to diagnose it. Without the agent, I’m at “tail the logs and pray” tier.

I haven’t proven the docs work for a third party. The proof would be a different developer cloning the repo, reading onboarding, and shipping a useful change without my help. I haven’t tested that. Maybe the docs are sufficient. Maybe they aren’t.

These are real limits. The codebase isn’t magic. My relationship to it isn’t magic. It’s a usable tool I built with another usable tool, and we’re keeping it useful by being honest about what’s there and what isn’t.

Keep going strong

What I’m proud of, looking at this project, isn’t the tier count or the test count. It’s that the system is coherent. Every decision is documented. Every gap is named. Every spec question that started open is either resolved with a pointer to the resolution, or marked clearly open with the trade-offs spelled out. You can pick this codebase up cold, read four pages, and know where you are.

That coherence is the artifact. The code is just the part that runs.

We’ll keep going. The next item is round restart-from-event-log replay — exercising the hash chain we built but never used. Then auth hardening: failed-login rate limit, refresh-token cookie, admin 2FA. Then probably deployment — turning the deployment doc’s patterns into a real Ansible playbook against a real droplet.

I don’t know how to write any of that. I know how to steer it.

So far, that’s been enough.

If you’re a senior engineer using an agent: you already know all this and just got 1,500 words of confirmation. If you’re a junior thinking about going solo with one: please don’t say I didn’t warn you. If you want to argue with any of it, the email is in the footer.