Colin McNamara

Subscribe
Archives
October 22, 2025

I Let AI Agents Upgrade My Production Code (Here's What Actually Happened)

Hi there,

Last night I tried something unusual: I let AI agents handle a production dependency upgrade from start to finish.

Not just the code changes—the entire process. Testing, QA, documentation, and deployment decisions.

What happened surprised me.

The Setup

I needed to upgrade 5 LangGraph implementations to version 1.0. These are running in production for Always Cool AI, handling compliance checks for kosher and FDA regulations.

32 tests needed to keep passing. Custom integrations needed to keep working. No downtime allowed.

I used Claude Code (Anthropic's CLI) with specialized subagents to see how well AI could handle this.

The First Surprise

The agent didn't immediately start making changes. It verified my actual codebase first.

Turns out I was already using 1.0-compatible patterns without realizing it. The agent saved me hours of unnecessary refactoring by checking BEFORE suggesting changes.

Dependency Hell

Installing LangGraph 1.0 triggered a cascade:

  • Peer dependency conflict with @langchain/core

  • Which needed @langchain/openai 1.0

  • Which needed Zod v4

  • Which needed OpenAI SDK v6

The agent handled it iteratively. Update one. Hit error. Learn. Update next. Hit error. Learn. Each mistake taught it something.

Very human-like debugging.

When Agents Disagreed

Here's where it got interesting.

I asked the agent to validate the upgrade. It spun up TWO specialized testing agents:

EvidenceQA (prove everything with tests):

Ran 32 tests. Built production bundle. Checked runtime behavior.

Verdict: ✅ Ship it.

code-reviewer (analyze code quality):

Reviewed all 27 changed files. Checked dependency safety.

Verdict: ❌ Critical issues. Revert to Zod v3.

They contradicted each other.

So I made them prove it. We checked peer dependencies for both OpenAI versions. Turns out we had TWO versions installed. The old one didn't support Zod v4. The new one did.

EvidenceQA was right. The runtime tests proved it.

The Lesson

Having two agents disagree was actually VALUABLE. It forced verification with evidence rather than assumptions.

When AI agents contradict each other, make them show their work. The one with evidence wins.

Final Results

  • 3 hours total

  • 28 files changed

  • 32/32 tests passing

  • Zero errors

  • Deployed successfully

Would I do this again? Absolutely.

But with caveats.

What AI Agents Did Well:

✅ Dependency resolution

✅ Bulk code changes (21 files)

✅ Testing and validation

✅ Systematic cleanup

✅ Documentation generation

What Needed Human Oversight:

⚠️ Deciding between conflicting advice

⚠️ Verifying custom integrations

⚠️ Understanding business impact

⚠️ Final deployment decisions

The Big Insight

AI agents are like incredibly productive junior engineers.

They can execute complex plans, debug iteratively, and handle systematic refactoring. But they need senior oversight.

They made mistakes. Missed one Zod error. One agent gave wrong advice. But their mistakes were RECOVERABLE because they could test their own fixes.

That's the key difference.

The Future

This isn't about AI replacing developers. It's about AI making developers more effective.

The agents handled systematic work (dependency updates, bulk refactoring, testing). I made judgment calls (which agent to trust, whether to deploy, how to handle custom integrations).

That's the collaboration model that works.

Full Story

I wrote a detailed technical post with all the code examples, agent tools used, and lessons learned: https://colinmcnamara.com/blog/ai-agents-langgraph-upgrade

Including the part where I made two AI agents prove their contradictory claims with evidence.

Worth a read if you're curious about practical AI agent use for real development work.

Best,

Colin

P.S. The agents used 40+ bash commands, 20+ file reads, 15+ edits, and launched two specialized subagents. All tracked with an AI-maintained to-do list. The workflow was methodical: Search → Read → Edit → Test → Repeat.

Don't miss what's next. Subscribe to Colin McNamara: