“Treat Your Code as a Crime Scene”

Recently I was fortunate enough to attend SwanseaCon, and the highlight speaker for me was Adam Tornhill.

With his own style of delivery his talk “Treat Your Code as a Crime Scene” always promised to be interesting – a mix of coding, psychology and forensics.

Against the backdrop of Jack the Ripper’s unsolved murders, and the variety of suspects identified, those he considers under greatest suspicion are the ones who are located near Whitechapel at the time of not just one of the offences, but a number of offences.

So what was Adam’s point?

Analysing legacy code

The wisdom of incorporating tests into our code is well known, but what about code without tests – legacy code written by our predecessors and not particularly well documented? Perhaps just because it’s legacy code, it isn’t necessarily bad code!

With this is mind, when considering the version-control system as a crime scene:

“You’ll create a geographic profile from your commit data to find hotspots, and apply temporal coupling concepts to uncover hidden relationships between unrelated areas in your code. You’ll also measure the effectiveness of your code improvements. You’ll learn how to apply these techniques on projects both large and small. For small projects, you’ll get new insights into your design and how well the code fits your ideas. For large projects, you’ll identify the good and the fragile parts. If we reveal the wealth of information that’s stored in our version-control systems then we can learn to predict bugs, detect architectural decay and find the code that is most expensive to maintain.”

Combined with various tools that Adam has written it’s possible to mine and analyse data from various version-control systems:

  • Code Maat: a command line tool used to mine and analyse data from version-control systems
  • Ownership-fractals: a Quil sketch designed to visualize knowledge ownership and generating fractal figures from ownership metrics mined by Code Maat
  • Indent-complexity-proxy: a tool to calculate complexity metrics using the indentation of the source code as a proxy for complexity

In conclusion

By applying forensic techniques to our version-control systems we can identify areas that consistently require developers to work upon them, so we can deduce:

  • Which parts of the codebase really matter
  • Which parts of the code become productivity bottlenecks
  • Which parts are hard to maintain
  • Where the bugs will be