In a spec-driven world, every change to the running system has a reason. Either the system does something it should not do, or we want it to do something new. AI Unified Process captures this with two work item types that sit next to the main Use Case flow: Bug and Enhancement.

At first glance the difference looks obvious. A bug is a defect, an enhancement is new behaviour. But once you work with real teams, the line gets blurry. What if the code matches the spec, but the spec is wrong? Is that still a bug?

The short answer is yes. And the way you handle it tells you a lot about the health of your specification process.

The three flavours of a Bug

A Bug in AIUP is anything where the running system does not deliver the value the stakeholder expects. The root cause can sit in three different places:

  1. Code bug. The specification is correct, the code does not match it.
  2. Specification bug. The code matches the spec, but the spec itself is wrong or incomplete.
  3. Test bug. The test encodes the wrong expectation, so a real defect slipped through.

All three are Bugs. They share the same intent: we are correcting a mistake, not changing direction. What differs is the fix path.

Fixing a code bug

This is the case that looks the most familiar, and the one most teams get wrong.

The naive flow is: reproduce, write a failing test, fix the code, done. The Use Case stays untouched, the test goes green, the ticket gets closed. That works for a one-off project, but in AIUP it misses the real problem.

In AIUP, code is shaped by the Harness. Skills, MCP servers, and guidelines are what turn a Use Case into running code. If a code bug appears, the question is not just “how do I patch this method”, it is “why did the Harness produce wrong code in the first place?”

The flow is:

  1. Reproduce the problem and write a failing test.
  2. Trace the defect back to its source in the Harness:
    • Did a skill generate the wrong pattern? For example, a Vaadin view skill that forgets to detach a listener.
    • Did an MCP server give the AI wrong or missing context? For example, a jOOQ MCP that did not return a column the AI then had to guess.
    • Did a guideline allow or even encourage the buggy pattern? For example, a naming rule that hides the intent of a method.
  3. Fix the Harness first. Update the skill, fix the MCP integration, tighten the guideline.
  4. Then fix the code so the test passes.
  5. Re-run the test suite. If the defect was systemic, other tests may now fail or other code may need regeneration.

This is the part that distinguishes AIUP from “AI assisted coding”. A code bug is rarely just a code bug. It is a signal that the executable architecture has a gap. Patching only the symptom leaves the gap open, and the next AI generated module will reproduce the same defect.

There are exceptions. A pure typo, a copy paste error, or a one-off mistake that no Harness change could have prevented stays at the code level. But the default question must be: what in the Harness let this happen?

Fixing a specification bug

This is where many teams get confused. The flow is:

  1. Reproduce the unwanted behaviour and capture expected vs actual.
  2. Trace it back to the Use Case and identify which part of the spec is wrong. It could be a precondition, a main flow step, an alternative flow, a business rule, or an acceptance criterion.
  3. Update the Use Case specification first. This is the real fix.
  4. Update or add the failing test that reflects the corrected spec.
  5. Adjust the code until the new test passes and the other use case tests still pass.
  6. Close the Bug with links to the spec change, the test, and the commit.

The spec changes, but it is still a Bug. We are not adding new value, we are correcting a description that was wrong from the start.

Fixing a test bug

A test bug is the most dangerous of the three because it hides other defects. The fix is to correct the test so it matches the specification, then run it and see what breaks. Often a test bug uncovers a code bug or a spec bug underneath.

What an Enhancement actually is

An Enhancement is a deliberate change to existing behaviour. The old specification was correct for the old intent. Now we want a new intent.

Typical examples: adding a field to a form, changing a validation rule from optional to mandatory, adjusting a calculation because the business decided on a new rounding rule.

The flow is straightforward:

  1. Identify the Use Case that owns the behaviour.
  2. Update the Use Case specification to reflect the new intent.
  3. Update or add tests that match the new spec.
  4. Implement the change.
  5. Verify that all related use case tests pass.

If the change is large enough to stand on its own, with its own actors, preconditions, and main flow, it is not an Enhancement. It is a new Use Case.

Bug or Enhancement? The decision rule

The mechanics of fixing a specification bug and implementing an enhancement look almost identical. Both update the spec, the tests, and the code. So how do you decide?

The rule is about intent, not mechanics:

  • Bug: The spec was always meant to describe behaviour X, but it accidentally described Y. We correct a mistake.
  • Enhancement: The spec correctly described the old intent, and now we deliberately want new behaviour.

Ask yourself: if a stakeholder had reviewed the spec a year ago with full knowledge of what they wanted, would they have signed it off? If yes, and we now want something else, it is an Enhancement. If no, and the spec was wrong all along, it is a Bug.

Why this distinction matters

You could lump everything together and call it “work”. Many teams do. But AIUP keeps Bug and Enhancement separate for a reason: the numbers tell you where your process is leaking.

If you classify specification bugs as Enhancements, your defect rate looks artificially low. You lose the signal that your elicitation, review, or AI generated spec audit is producing wrong content. The feedback loop goes silent.

If you keep them as Bugs, you can sub-categorise them:

  • Code bugs point to gaps in the Harness: a weak skill, a missing MCP context, or a guideline that allows the wrong pattern.
  • Specification bugs point to weak requirements work, missing stakeholder involvement, or sloppy AI generated spec reviews.
  • Test bugs point to a test suite that needs review itself, or to a testing skill that produces brittle tests.

Each category has a different upstream fix. Mixing them all together means you cannot improve any of them.

How traceability holds it all together

In AIUP every Use Case, Bug, and Enhancement is linked to code and tests through the @UseCase annotation. The AIUP Navigator plugin shows the full chain from requirement to test to code.

This is what makes the distinction practical instead of theoretical. When you close a Bug, the link stays. Six months later you can ask: how many spec bugs did we have in this Use Case? Did the same Use Case attract multiple Enhancements? Is one part of the system a defect magnet?

Without traceability, Bug and Enhancement are just labels. With traceability, they become the raw data for improving how you work.

Summary

  • A Bug corrects a mistake. The mistake can be in the spec, in the Harness, or in a test. A code level defect is usually a symptom of a Harness gap, not a standalone issue.
  • An Enhancement changes direction. The old spec was right for the old intent.
  • A new Use Case is bigger than an Enhancement and stands on its own.
  • The decision is about intent, not about which artifacts get touched.
  • Keeping the categories separate gives you the signal to improve your upstream process.

Spec-driven development is not just about writing specs before code. It is about treating the spec as a living artifact that can itself be defective. Once you accept that, the Bug category gets a lot more interesting.