Claude Code vs Codex: Building Grafana Dashboards from Prometheus and Thanos

There’s a massive snowstorm blanketing the Northeast right now, and with nowhere to go, I figured I’d put my time to good use: comparing AI coding assistants while building something practical. Specifically, I’ve been using both Claude Code and OpenAI’s Codex to take data from a Prometheus and Thanos implementation and create JSON to import into useful Grafana dashboards.

The verdict? Claude Code has been significantly more useful for this kind of work.

The Task at Hand

If you’ve worked with Thanos and Prometheus, you know the observability stack can get complex quickly. The goal was straightforward: crawl the Thanos API, understand what metrics are available, and then logically create Grafana visualizations that are actually useful—not just pretty graphs, but panels that tell a story about what’s happening in your infrastructure.

Where Claude Code Shines

What sets Claude Code apart is its ability to reason through the problem. It doesn’t just generate code—it crawls the Thanos API, understands the relationships between metrics, and then makes intelligent decisions about how to visualize that data in Grafana. The dashboards it produces aren’t just syntactically correct JSON; they’re logically structured in ways that make operational sense.

When I asked it to explore what metrics were available and suggest visualizations, it didn’t just dump a list. It grouped related metrics, suggested appropriate panel types (gauges for saturation metrics, time series for rates, stat panels for current state), and organized them into rows that followed a logical troubleshooting flow.

The Codex Experience

Codex, by comparison, felt more like a code completion tool that happened to know about Grafana JSON syntax. It could generate valid dashboard JSON, but the intelligence about what to visualize and why wasn’t there in the same way. I found myself doing more of the thinking and using Codex more as a faster way to type.

That’s not nothing—faster typing is valuable. But when you’re trying to build something from scratch and want an assistant that can reason about the problem domain, the difference becomes clear.

The Bottom Line

For tasks that require understanding an API, reasoning about what data is available, and making intelligent choices about how to present it, Claude Code has been the clear winner in my experience. It feels less like autocomplete and more like pair programming with someone who actually understands observability.

And now for the part of the blog where I vent about sports, because what else are snow days for?

I’m genuinely happy that my Giants have finally hired what appears to be a competent coach. It’s been a long time coming, and there’s actual reason for optimism in East Rutherford for the first time in years.

What I’m not thrilled about is watching the New England Patriots somehow stumble their way back to the Super Bowl after a 4-13 season. I don’t know how they keep doing it, but here we are. They’ll face the Seattle Seahawks, and all I can say is: I hope Sam Darnold and Jaxon Smith-Njigba wipe the floor with them.

Go Seahawks.

The Task at Hand#

Where Claude Code Shines#

The Codex Experience#

The Bottom Line#

The Task at Hand

Where Claude Code Shines

The Codex Experience

The Bottom Line