Engineering2026-04-2012 min read

Shipping 20 live agent cards at 60fps on a 512MB VM

How we built Agent Dashboard — the pixel-art observatory for Claude Code agents. Canvas 2D over React, raw ws over Socket.io, and why we migrated off Railway mid-flight.

Agent Dashboard is the control-tower for the twenty-plus Claude Code agents that run across our local and remote environments. It is a pixel-art office where every agent is a character at a desk; when an agent opens a file, you see it move. When an agent is thinking, a little thought bubble appears. When it commits, you see the signal ripple across the room.

It sounds whimsical. It is. It is also the single most used tool in our studio. The question other developers ask us most is "why not just a log tail in a terminal?". The answer is in this post, along with every trade-off that got us to the current version.

The problem we were actually solving

We build software with a 6-agent waterfall pipeline (product-manager → business-analyst → architect → developer → reviewer → test-engineer). Each agent runs in parallel across different feature branches. At any moment there might be twenty conversations in flight, each with its own file reads, edits, tool calls, and token burn.

Log tails work when you have one agent. They do not work when you have twenty, because the thing you need to know is never "what did agent #1 do at 09:14:22". It is "which three agents are stuck", or "has the reviewer touched the PR", or "why is the build failing right now".

Off-the-shelf observability (Datadog, Grafana) treats every agent as a log stream. That does not match how humans think about multi-agent systems. We think spatially — there is a team, people have roles, work moves between them. So we built a spatial view.

Constraints that shaped everything

Before a single line of code, we wrote the hard limits:

Target VM: Fly.io shared-cpu-1x, 512 MB memory. No GPU, no WebGL acceleration we can count on.
Docker image budget: ≤ 100 MB. Cold starts on Fly.io are measured in seconds, not minutes.
Cold start: ≤ 1 second from HTTP request to agent data on screen.
Paint loop: 60fps during active use. Laggy realtime is worse than no realtime.
No per-agent server process. One VM, N websocket clients, single node event loop.

These are not arbitrary. They were the budget we had and the budget we had to stay inside. Every subsequent decision below is downstream of one of them.

Stack decision #1 — Canvas 2D, not React, not WebGL

The obvious starting point is React. We use it everywhere else. We tried it for about two hours.

At 20 agents × 6 UI elements each (sprite, name, thought bubble, file indicator, tool call, status dot) you have ~120 components re-rendering every frame. React reconciles all of them against the virtual DOM every tick. Even with memoization, you bleed frames. We measured 28–34 fps on our target VM with a trivial scene.

WebGL was the opposite problem. Plenty of headroom, but an iridescent blob for twenty desks is overkill, and the bundle cost of Three.js (~700 KB gz) blows the Docker image budget when you include fonts.

Canvas 2D is the sweet spot. One requestAnimationFrame loop. Manual diff of what changed. Direct draw calls. The whole rendering core is about 400 lines, and it holds 60fps on 20 agents with room to spare — we can scale to 40 before the frame budget starts to hurt.

Trade-off we accepted: we hand-wrote collision, hit-testing, text layout, z-ordering, and sprite animation. Those come free in React + DOM. We estimated two extra weeks of engineering in exchange for a steady 60fps. We took the deal.

Stack decision #2 — raw ws, not Socket.io

Socket.io is the default. It is also 80 KB gz on the client, and brings with it a fallback polling loop that ignores your careful work to avoid it.

We run raw ws on the server and native WebSocket on the browser. No reconnection logic out of the box, no rooms, no namespaces — we wrote 120 lines of our own. The result: ~4 KB on the client, deterministic behaviour, zero transport surprises.

The loop is stupid-simple. Each agent writes to its log directory. Chokidar watches those directories. When a file changes, the server broadcasts a diff over WS. Every connected browser receives the diff, merges it into its in-memory state, and the canvas redraws only the affected agent tile.

Stack decision #3 — 23 modules under 200 LOC each

We do not do "one big app.js". The client is 23 JS modules: constants, math, audio, palettes, particles, bubbles, background, effects, agentState, creatures, renderer, state, websocket, ui, tasks, admin, clickAnims, settings, adminPos — and a few more that accumulated over the project.

Each file has one job. None of them exceed 200 LOC. Vite bundles them into a single client chunk that is still under 80 KB gz, and we can reason about any single file in one sitting.

The 200-LOC ceiling is not dogma. It is the point where a module stops fitting on one screen, and once you lose that you lose the ability to refactor without fear.

Three failures on the path to 60fps

Failure 1 — painting everything every frame

Version 1 cleared the whole canvas and repainted all 20 agents every frame. It looked smooth until the browser tab was backgrounded; when the user came back, frames were dropping into the teens. Chrome throttles requestAnimationFrame aggressively when the tab is not visible, so our "60fps" was an illusion.

Fix: dirty-rect tracking. Only repaint tiles that changed. The main loop now paints two to four tiles per frame on average instead of twenty.

Failure 2 — GC pauses from per-frame object allocation

We allocated a fresh {x, y, w, h} object inside every draw call for every tile. At 60fps × 20 agents × 6 elements you are generating 7200 objects/second. V8 eventually pays you back with a 40–80ms GC pause, and that pause lands at exactly the wrong moment — when the user is clicking something.

Fix: object pooling. We preallocate scratch buffers for the hot paths and reuse them. The GC pauses went from "noticeable every few seconds" to "not measurable".

Failure 3 — chokidar on every file in the vault

Our naive first version watched the entire vault directory recursively. When you have thousands of files, chokidar holds thousands of file handles, which blows past the 512 MB memory ceiling within an hour. The VM OOM-killed itself.

Fix: scope the watcher to just <vault>/projects/*/logs/. That cut the watched surface from 10k files to ~40. Memory usage flatlines at 180 MB.

The Railway → Fly.io migration mid-flight

Three weeks into development our Railway trial expired. We had a live URL, a working app, and a deadline the next morning to show it to an internal audience. We had two choices: pay Railway, or migrate.

We migrated. The Dockerfile already existed (multi-stage, alpine, 60 MB). We wrote a fly.toml targeting fra, one flyctl launch, and it was live in under an hour. The only real work was rewriting a preview-environment webhook that had been Railway-specific (deleted, then re-implemented as a GitHub Actions flow).

Why it was boring: the app was already stateless. Chokidar watches local files; those files come from sync-to-cloud.js, a separate process that POSTs to the server every two seconds with the vault diff. Migration was "set up a new VM, point sync at its URL, done". There was nothing to export, no database to migrate, no DNS to re-verify.

This is the benefit of the 512 MB / stateless constraint we set at the start. It bought us the ability to migrate vendors in a morning.

Process receipts — how the pipeline actually looked

The dashboard was built with the same waterfall pipeline it is now observing:

5
Agents in pipeline: 340
Commits: 3
Deploys: 3 weeks
Spec → prod: 16.6 ms
Frame budget: 4.2 ms avg
Actual paint: 60 MB
Docker image: 180 MB
Resident memory

Every one of those commits has a linked agent trace in our internal tools. We publish them unredacted on the detail page of this case study — you can see exactly which agent proposed what, and which pull requests got sent back for revision.

What we would do differently

We would start with TypeScript. We started in plain JS for speed. Three weeks in, the first-class type definitions for agent state started to matter more than the warm-up. Migration cost us a day we did not need to spend.
We would write Playwright e2e earlier. We added them late, after a redeploy broke a critical path. The tests paid for themselves within two weeks. Earlier, and we would have caught more regressions in the preview-env flow.
We would not build our own sprint-preview webhook. Getting it to work with Railway took two days. Getting it to work with Fly took three more. The feature ended up getting replaced by a simple GitHub Actions flow that opens a PR and comments the preview URL. Simpler and more debuggable.

The outcome

Live since 2026-03-23 at agent-dashboard-ancient-mountain-4835.fly.dev.
Canvas 2D at 60fps on 20+ live agents.
60 MB Docker image, 180 MB resident memory, fra region, shared-cpu-1x.
Spec to production in 3 weeks. Zero downtime migration from Railway mid-flight.
Used daily in the studio since — we ship every product with it open.

The full case study with the studio context and stack tags is at /labs. The code is private; the agent pack that ships it is public at github.com/workmailan8n-hash/btw-agents-pack.

BTW Studio builds AI-native products for small businesses and startups. If you have a tool or internal ops dashboard you wish existed — that is the kind of brief we take on. See /contact.