aicode-qualitydebuggingdeveloper-experiencelessons-learned

AI doesn't write perfect code (and that's fine, if you're paying attention)

January 11, 2025

I had a dream last night. Not a metaphor - an actual dream, the kind where you wake up and need a moment to realize it wasn't real. I was in a meeting with other engineers, reviewing code from one of my projects. Someone pulled up a file, looked at me, and said "how the hell did you think this was okay?" Then they started pointing out bugs. Edge cases I'd missed. Logic that made no sense. The room got quiet. I had no defense.

I woke up and realized the uncomfortable part wasn't the public embarrassment. It was that I knew they were right. These were mistakes I would have caught instantly if I'd written the code myself. But I didn't write it. I accepted it.

I've been using AI to write code for over a year now. Claude, GPT, Copilot - the whole circus. Since June, Claude has been my daily driver, handling everything from quick fixes to full feature implementations. And somewhere along the way, I got comfortable. Too comfortable. I started treating AI output as "probably fine" instead of "definitely needs review."

This post is about what happens when you do that. Real bugs, from my real codebase, that I shipped because the code looked right.

Bug #1: The level that wouldn't level

Users on my platform earn XP for various activities. When they hit certain thresholds, they level up. Simple enough. I also have achievements that grant bonus XP when unlocked.

The AI wrote perfectly reasonable code for unlocking achievements and granting XP:

typescript
// When an achievement is unlocked, increment the user's XP
await prisma.user.update({
  where: { id: userId },
  data: { xp: { increment: achievement.xpReward } },
});

Looks fine, right? XP goes up. The number in the database increases. Ship it.

Except... the level is also stored in the database. And the code never recalculates it. So users who earned a bunch of achievements during onboarding would see this in their dashboard:

plaintext
Level 1
Progress: 1,255% to next level

Over a thousand percent progress to level 2. Because they had enough XP for level 5, but the level field was still 1. The AI understood "add XP." It didn't understand "XP and level are related, and changing one might require updating the other."

The fix was trivial once I found it - just recalculate the level after processing achievements. But finding it took significantly longer than the AI took to write the original bug.

Bug #2: The streak that was always 1

Users have contribution streaks - consecutive days of activity. The dashboard shows their current streak and longest streak. The achievement system needs to know about streaks too.

Here's what the AI generated:

typescript
for (const day of sortedDays) {
  if (day.contributionCount > 0) {
    streak++;
    if (day.date === today || !foundToday) {
      foundToday = true;
      current = streak;  // Set current streak
    }
    longest = Math.max(longest, streak);
  } else {
    if (foundToday) {
      break;
    }
    streak = 0;
  }
}

The logic looks reasonable. Iterate through days, count consecutive contributions, track the current and longest streaks. The code is clean, the variable names are good, the comments explain the intent.

But look at when current is assigned. It only updates once - when we first find today or haven't found it yet. After that, foundToday is true, so the condition fails, and current stays at 1.

Users with a 7-day streak saw "Current streak: 1" in the UI. The achievement for a week-long streak triggered correctly (because longest was calculated right), but the display was wrong.

The AI wrote code that was almost correct. Close enough to pass a cursory review. But the subtle logic error meant it was fundamentally broken for its primary use case.

Bug #3: The XP that vanished

This one hurt.

The sync function calculates a user's XP based on their GitHub activity - commits, PRs, reviews, streak bonuses. The AI wrote this:

typescript
// Calculate XP from activity
const xp = calculateActivityXP(commits, pullRequests, reviews, streak);

// Update user
await prisma.user.update({
  where: { id: userId },
  data: { xp },
});

Clean. Simple. Completely ignores achievement XP.

Every time the sync ran, it would recalculate XP from activity and overwrite the total. Achievement XP? Gone. That bonus 500 XP for hitting your first milestone? Vanished on the next sync.

The AI understood the immediate task: calculate XP from activity. It didn't understand the system context: there are multiple sources of XP, and they need to be summed, not replaced.

Bug #4: The fix that broke something else

This is my favorite because it's a chain reaction.

I had a React hydration error - the classic "server and client rendered different content" warning. The culprit was using Date.now() directly in the component to check if a user was new.

The AI suggested a fix:

typescript
// Before: hydration error
const isNewUser = Date.now() - user.createdAt < ONE_HOUR;

// After: AI's fix - move to useEffect
const [isNewUser, setIsNewUser] = useState(false);

useEffect(() => {
  setIsNewUser(Date.now() - user.createdAt < ONE_HOUR);
}, [user.createdAt]);

This fixed the hydration error. No more console warnings. Ship it.

Except now isNewUser starts as false on the first render. The onboarding modal uses this value to decide whether to show. And React's useState only reads its initial value once.

So the onboarding modal never appeared. For anyone. The fix for one bug created a worse bug - new users didn't see the onboarding flow they needed.

The AI solved the problem I asked about without considering the downstream effects. Which is exactly what you'd expect from something that doesn't actually understand your application.

Bug #5: The infinite loop nobody saw coming

This one is from a different project - a training app for horse owners. I had a component that fetched benchmark data and displayed comparative analytics.

The AI wrote a perfectly reasonable useCallback with dependencies:

typescript
const fetchBenchmarks = useCallback(async () => {
  try {
    const result = await getBenchmarks(horse, timeRange);
    setData(result);
  } catch (err) {
    setError(t('tools.analytics.comparative.error'));
  } finally {
    setLoading(false);
  }
}, [horse, timeRange, t]);  // <-- spot the problem

useEffect(() => {
  fetchBenchmarks();
}, [fetchBenchmarks]);

The t function is from next-intl for translations. It looks like a stable reference. It's not. It gets recreated on every render.

So useCallback recreates fetchBenchmarks on every render. Which triggers useEffect. Which calls the API. Which updates state. Which triggers a re-render. Which recreates t. Which recreates fetchBenchmarks. Forever.

The fix was to remove t from the dependency array and use a static error string. But the AI didn't know that t was unstable - it just followed the ESLint rule that says "include all dependencies."

Bug #6: The calculation that wouldn't stop calculating

Same project, similar pattern. A feed calculator component that computes nutritional requirements for horses.

typescript
useEffect(() => {
  const timer = setTimeout(() => {
    calculateFoderstat();
  }, 200);
  return () => clearTimeout(timer);
}, [weight, age, breed, /* ...many inputs... */, calculateFoderstat]);

The AI included calculateFoderstat in the dependency array because ESLint said so. But calculateFoderstat was wrapped in its own useCallback, which recreated the function whenever any of its dependencies changed.

The result: the calculation ran, updated state, which triggered the useCallback to recreate, which triggered the useEffect, which ran the calculation again. The UI flickered constantly as results updated in an endless loop.

The fix was an ESLint disable comment - intentionally excluding the function from dependencies because we know it updates when its inputs change. Sometimes the linter is wrong. But the AI doesn't know when to break the rules.

The pattern

If you look at all these bugs, there's a common thread: the code is locally correct but globally broken. Each piece does what it says. The XP increment increments. The streak loop loops. The hydration fix fixes hydration. The dependency array follows the linter rules exactly.

That last one is particularly interesting. The AI followed ESLint's exhaustive-deps rule perfectly - and created infinite loops. Sometimes the "correct" thing to do is break the rules. But knowing when requires understanding the system, not just the syntax.

Software isn't a collection of isolated functions. It's a system. And AI, at least right now, is really good at writing functions and really bad at understanding systems.

What I do differently now

I stopped treating AI-generated code as "probably correct." I now treat it as "definitely needs review." A few things that help:

Ask "what else does this affect?" Every time AI writes code that modifies state, I mentally trace through what else depends on that state. XP changed? What about level? User created? What about onboarding flags? The AI won't ask these questions, so I have to.

Be suspicious of "clever" solutions. When AI writes something that looks elegant, I pay extra attention. The streak bug was in a particularly clean-looking piece of code. Clean doesn't mean correct.

Test the unhappy paths. AI tends to write code that works for the primary use case. Edge cases, error conditions, state transitions - these are where the bugs hide. If the AI didn't explicitly handle something, assume it's broken.

Use a workflow that forces thinking. Having a structured conversation before writing code catches most of these issues. If I'd asked "what happens to level when XP changes?" before accepting the achievement code, I would have caught that bug in seconds.

The uncomfortable truth

Here's the thing nobody wants to admit: using AI to write code doesn't mean you can understand less about what you're building. It might mean you need to understand more.

When you write code yourself, you're forced to think through the logic. The act of typing makes you consider edge cases, dependencies, and side effects. When AI writes code for you, you skip that thinking phase entirely - and then you have to do it anyway during review, but now you're reviewing someone else's logic instead of developing your own.

This is why I started writing blueprints - a mix of pseudo-code and real code that forces me to think before the AI types. As I wrote in my previous post: "The hard part of programming isn't typing - it's thinking." When I sketch out function signatures, data shapes, and the general approach first, I'm doing the thinking. The AI just fills in the syntax.

The developers who will thrive with AI tools are the ones who already understand their systems deeply enough to spot when the AI gets it wrong. The ones who think they can skip that understanding are going to ship a lot of bugs.

Still worth it

I want to be clear: I'm not going back to writing everything myself. AI-assisted development is genuinely useful. It's faster for boilerplate. It's great for exploring unfamiliar APIs. It helps me prototype ideas quickly.

But it's a collaboration, not a delegation. The AI writes code. I understand it. The AI suggests solutions. I evaluate them. The AI generates a lot of text very quickly. I read it slowly and carefully.

That's the workflow that works. Not "let the AI handle it" but "let the AI help, then verify everything."

Remember that dream I mentioned? The one where someone pulled up my code and asked how the hell I thought it was okay?

The uncomfortable part wasn't the public embarrassment. It was knowing they were right. That I could have caught it. That I didn't because I trusted code I hadn't really read.

I'd rather lose twenty minutes reviewing AI output than relive that dream in a real meeting.


All bugs in this post are real and came from my actual codebases - across multiple projects and even different frameworks. They've all been fixed. I'm sharing them because learning from mistakes - especially embarrassing ones - is how we get better. If you want to see how I try to prevent these issues upfront, check out my post on AI workflows.