How does software development powered by LLM-based tools impact the software engineering discipline?

It is not Vibe Coding; it is Spec-Driven Development! In this article, Birgitta Böckeler presents three different approaches to a spec-driven LLM-based development:

  • Spec-first: the specification is used to generate code, but after the generation, the developer changes the code, and the specification is discarded.
  • Spec-anchored: the specification is maintained even after code is generated, for future maintenance and evolution.
  • Spec-as-source: the code is the specification, and the development occurs at the specification level only.

It is debatable whether the last approach, spec-as-source, can be fully applied to software development, where the entire process is conducted at the specification level. Proponents argue that LLMs function as a new high-level language, with code generation acting as a compiler. While this mirrors the claims once made by model-driven and low-code development, a crucial distinction remains: compilers are deterministic and certifiable. In contrast, LLM outputs can shift unexpectedly—not only across different versions but even within the same version. Furthermore, despite their successes, low-code tools never replaced traditional development, nor did they succeed in turning business experts into developers. If this new shift succeeds where others failed, it will likely be for reasons this analogy doesn’t capture.

Different levels of impact occur:

  • Writing and reviewing code: While spec-first and spec-anchored approaches reduce the manual coding burden, developers must still review the output to ensure quality. In contrast, the spec-as-source model treats the specification as the primary codebase, shifting all writing and review strictly to the specification level.
  • Code robustness: Can generated code truly be trusted? In a spec-as-source workflow, the LLM functions as a compiler, theoretically exempting the developer from inspecting the underlying code. However, this raises a critical question: how do we certify that output? While “prompt engineering” is often cited as the solution, it must constantly contend with the inherent non-determinism of LLMs.
  • Code quality: Code quality becomes a liability if developers are forced to modify the output post-generation. Some argue that LLMs actually increase the necessity for rigorous code reviews. The initial boost in productivity may be offset by long-term maintenance costs and the technical debt introduced by AI. In my experience, when an LLM is tasked with debugging its own output, it often fixes errors by adding layers of “noise” or redundant logic. Consequently, even after the code becomes functional, it typically requires manual refactoring. While an LLM can assist in this refactoring, it still requires close developer supervision to ensure a clean result. This type of behavior is reported in Assessing internal quality while coding with an agent.