Refactoring

For years the cost of change in software development was considered to increase exponentially with how late it occurs during the development process. Due to this believe the methodologies have promoted an extensive and detailed requirements engineering phase such that the development team is not surprised by the need to change the code during development. Write it once, never need to change it, was the wanted rule. Besides requirements also the design of the system was done upfront to avoid changes of the code structure late in the project. As a result analysis paralysis is a negative behaviour associated with the desire to make the correct decisions before the implementation starts.

Software is a digital artefact, in the sense that a change in a single bit can break it, contrarily to other engineering artefacts that support a certain level of changes. For instance, it is possible to add new floors atop old buildings. Therefore, the cost of software change is due to the easiness software breaks when changed and the effort necessary to fix it. Debugging is a time and human resources consuming activity. Identifying the fault associated with a failure takes time, which increases with the distance between the fault introduction and the failure manifestation. On the other hand, the effort spent to identify the fault, either by using a debugger or, more prosaically, by printing messages on the standard output, is lost once the bug is fixed. However, if the code has a good coverage of regressive tests any change that breaks the code is immediately identified and the effort to fix the code is reduced.

The existence of an extensive set of automatic regressive tests in the code revokes the old rule of cost of software change because the cost is uniform along the whole software development cycle. As a consequence, refactoring, a new software design technique that allows the identification and definition of abstractions on the context of the implemented functionality, becomes possible. The complexity introduced by the design is confined to the required by the implemented functionalities, no more, no less.

Refactoring the code, which means changing the structure of the code while preserving its functionality, should be done in very small steps such that the code never breaks, and if it does the last changes should be so small that the time required to identify the fault is short, keeping the cost of change uniform. Therefore a few rules should be followed when refactoring the code:

  • The code that is going to change is only removed at the end. If the code was removed at the begin the functionality would break.
  • Introduce code if smaller refactoring steps are required. This auxiliary code is only there to avoid big gaps during the change, which may break the code, and it is removed at the end, when the refactoring is complete.
  • When a bug is found for which there isn’t a test that fails, write the test before starting the debug. This increases the coverage of tests and reduce the cost of change in the future.
  • Either introduce new functionality or refactor code, do not do both at the same time. Refactoring is finding the structure that best fits a set of working functionality.

The integration of test-first programming and refactoring results on test-driven development. A method which has three steps: (1) write a test that fails but specifies a required functionality; (2) Implement the functionality such that the test passes; (3) refactor the code such that the structure has the abstractions that fit the implemented functionality.

Refactoring has other goals besides design. It is the basis of an idea of living code, where it is constantly changed by the development team, fostering a shared ownership of the code. Workflows of refactoring describes the different applications: test-driven development, litter-pickup, comprehension, preparatory, planned, and long term.