As part of our weekly hours worth of group practice we have been working on the Gilded Rose Kata. We worked with this kata for 5-6 weeks with different pairs. We found this a very interesting kata with a number of lessons.
Gilded Rose
The Gilded Rose kata (we choose the C# version) is a kata where there is some existing code with certain restrictions which needs changing. The code tracks the price, quality and sell by dates of fantasy items for sale in the “Gilded Rose” store as they change over a number of days. The change is that a new class of items, “Legendary items”, with their own behaviour for price, quality and sell by date must be added. By most measures the existing code smells and it is not exactly clear where changes should be made.
Kata overview
With this kata our aims were to practice controlled refactoring. For us this meant creating characterisation tests for the existing code base with suitable coverage before moving onto add the new feature. Whilst adding the new feature the existing code should be refactored to reduce its smell. The testing approach, approaches for the new feature and extent of refactoring were the interesting lessons that came out of this kata.
Testing Approaches
Individual Tests
The main approach taken for characterisation tests was to take each existing item category and create individual tests that checked each item’s behaviour after a certain number of days have passed. The resulting tests ended up like the sort of tests that may have been written if TDD had been used to develop the Gilded Rose code in the first place. Each test aiming to test a single item under certain conditions and often checking only one property of the item.
This resulted in clean understandable tests that also served as a guide to the expected behaviour of the Gilded Rose system. Each test aimed to test only part of the behaviour and a failure in a test indicated the part of the behaviour that had been changed. Once sufficient coverage (~100%) had been achieved our pairs looked at refactoring and adding new behaviour. From these tests it was easy to start adding new tests for the new class of ‘Legendary’ items and develop the new code using TDD.
The downside of this approach was that it took a considerable amount of time and dedication to produce these tests along with a coverage tool to make sure that the code was covered before refactoring started. The production of the tests was certainly tedious at times. These tests typically took one hour’s practice time and we would work as the same pair the following session to start refactoring and adding the new code.
On reflection a team could spend a lot of time producing extensive tests for a minor change in a small sub-system worth only a small amount to the business. If this was the only approach available then the cost may just have to be paid as the result of working with legacy code. However, other approaches may be available.
Golden Output Tests
Another approach was to take the existing program which runs through 30 days for a list of items covering all the existing item types and prints it to the console. By taking the output and saving it to a file – a golden output file – we were able to write a test that re-directed standard out from the console to a text buffer and compare this text buffer with the captured file. This test took slightly longer to write than a single individual test but once written we were left with 100% coverage according to our tool and could look at implementing and refactoring the existing code. Not a unit test at all but a characterisation test nonetheless.
From this point on we used a TDD approach to write a small test for one part of the behaviour of the new Legendary items and then implement it. We focused on the areas of the existing code base that we would like to add the new functionality to, add the functionality and then refactored the code to make it cleaner. We would only refactor the parts that we had changed and perhaps some small sections around the area of change. Repeat until all the new functionality had been added.
If our refactoring or new code failed the characterisation test then it was either a quick inspection told us the problem or ctnl-z before trying a smaller step. With this approach we were able to add the new functionality and refactor parts of the code well within a single practice session.
Yet, our characterisation test did not serve as a good example of the expected behaviour of the system. Another developer could not take the test and start to understand the system with it. Only our coverage tool indicated that we had 100% coverage – with individual tests it was possible to take the system spec. and write each individual test to confirm it. Failures in the test did not point to a specific behaviour that had broken. Also, the approach made us just focus on the area of change – to make a reasonable clean change and relatively minor refactoring.
Comparision
The difference in approach between a test suite and a single edge based characterisation test is important when choosing to employ these approaches in production code. Both approaches bring the code under control and allow the system to be refactored where necessary. I find the difference is a judgement based on the needs of the business and the impact on the whole software system. If the module that needs changing is a small module with little history of change then the golden output approach would be a suitable starting point to bring the module under control and make suitable changes without spending undue effort for a small change.
Where the module is critical part of the software or is subject to repeated change then individual tests are more likely to be suitable. These tests are our aim – they should be fast to run, operate in isolation and isolate the behaviour under test. This investment is best made when we know that we will be making further changes to the module and want a good suite of tests to control the changes.
Experience can be a judge for how to bring code under control. When choosing between techniques then lean principles would tend towards doing just enough to bring the code under control in the least time and defer the creation of individual tests until subsequent changes to the module are necessary.
(In order to introduce any of the tests very careful small changes without surrounding tests were necessary. See Michael Feather’s “Working Effectively with Legacy Code” for techniques for introducing a testing ‘seam’.)