Very often, one finds oneself needing to improve or modify source code written by someone else or even by you (usually if it was a long time ago), and you find it really difficult to follow the underlying theme. It is a rather like trying to understand how to get somewhere in a large foreign city without a map and without anyone to ask. The act of understanding the structure and intent of software is formally known as software comprehension or reverse engineering.
While there are many free and commercial products that are able to automatically generate diagrams of source code, I haven’t yet found one that adds significant value for helping you to actually understand it better. The chief reasons are that when trying to document code graphically, one often needs to make annotations, highlight important features, and one needs to eliminate parts that add noise and are not part of the fundamental structure. I also find that the act of physically mapping out the important parts of your code adds familiarity with the inner workings enabling you to better grok it.
Tools and techniques
There are various articles that cover strategies for finding the main threads and features in code and following them through. This can often be done by following where and how important variables are manipulated and passed around. This is a useful basic strategy for helping you to focus on what is important. This article will not go into any depths to explain these techniques, there are other resources that focus on this aspect of analysis (for example, this Stack Exchange discussion has a variety of ideas and approaches and here is another interesting resource). It’s more focused on building a visual representation of the software you are analyzing.
When doing analysis, something useful is first to get a good understanding of the layout of the code. This can be done easily and cheaply with certain automated tools that just focus on the file level of your code. For example, in a previous article, I demonstrated the use of a Perl script (ScanDepends) that uses Graphviz to show how C files are connected and how they interact based on what their #include statements specify. In very large projects, this may even be overwhelming and you may want to go to a higher level overview. Doxygen may also give you a rough picture, although it has a few limitations compared to the ScanDepends Perl tool I demonstrated in terms of the visibility of the underlying C files being referenced (where only header files are physically being included, the C file inclusion is implicit). At other times, you may want to look at a very small part of the code containing only a few functions – in these cases, Doxygen does a great job.
I have found that the basic tool that will help you best to follow how functions are called and find references to variables is a decent IDE. With reference to analyzing C code, I have found Eclipse CDT to be my favourite, although there may be other IDEs that other people prefer. It is fast and quick to use if you learn some of the more useful keyboard shortcuts and the latest versions seem to index the code very well and accurately. The key to a good IDE is its ability to navigate the code, to understand dependencies and references, and be able to quickly and easily navigate between them. Another tool that can be very useful is Doxygen when used with the Graphviz Dot tool. If set up correctly, it will show you a nice call graph that one can easily navigate through with your mouse. It also allows you to jump to the source code, and all this is done from your internet browser. However, I find Eclipse on its own to be adequate and also faster and slicker to use.
When you find the code you are working with to be really messy, difficult to follow with way too many dependencies and complicated constructs and unclear goals, you need to resort to a more sophisticated strategy.
Something not offered by automated tools mentioned above (or any others I have tried yet) is a way to map out the call graphs and manipulate them, suppress unwanted items, add items, and generally annotate. Originally, Martin Fowler introduced me to UMLet in his blog, an open source tool for quickly sketching up UML diagrams. Until recently I have mostly used it, however recently I have found something else that is even quicker and easier for me to use.
The tool is called yEd from yWorks. It is free, is very useful and powerful and runs on Windows, Linux, and Mac. Although it is a general purpose diagramming tool, in the context of this tutorial, it is extremely good at quickly and easily document the parts of the code that matters to you in as much detail as you need and allows you to make annotations as you go.
I strongly suggest you getting yEd if you haven’t already, and need to produce quality diagrams easily and in a maintainable way – even if it’s for other general purpose tasks unrelated to this article. I have found it highly useful for many diagrammatical activities, whether they be designs or creating quick informal sketches as I will explain below.
A Quick Tutorial
In this tutorial, I will go through a step by step guide to get yourself familiar with the concepts. For the example used, I have randomly chosen a relatively simple C project found on GitHub called CANtact. In reality, you may not need much diagramming for something well written and self-documenting, but for larger projects where the code hasn’t been well polished, this activity becomes a lot more useful. I will explain the concept, and allow you to continue as you wish, or to apply the techniques to your own comprehension exercises, if and when they exist. I have used Windows 7 and Version 3.14.4 of yEd for this example.
First, from your IDE, copy the filename containing the source code function onto your clipboard (Ctrl+C). Then go into yEd and press the shortcut keys Ctl+Alt+G (shortcut for “Group”) or Right Click/Grouping/Group. This creates an empty group where you can paste the filename (Ctrl+P).
Now switch back to your source code and copy the name of the function you want to document. Back in yEd, create a new group (Ctrl+Alt+G) under the file you just created. Paste the function name into the title bar of the new group as before. Enlarge this new group if required so the text is not obscured.
Now, while holding down the shift key, use your mouse to drag this function on top the file name group you created earlier. If done correctly, you will see tiny corners standing out around the file group before going to the next step.
Drop the function into the file, the file group’s size will automatically grow to accommodate the function.
Now do the same with the next function (which is called by the main function).
Move the new function so it is nicely spaced away from the first function (that calls it).
Click your mouse on the edge of the calling function and while holding the button down, drag an arrow over to the called function. If done correctly, little corners will again appear as is shown below. While those corners are showing, drop the arrow. It should attach itself.
In the same way as before, create a new file referenced by your first file.
As before place functions you want to document into that file. Always remember to use the shift key when moving a function into or out of a group.
As before, draw an arrow from the first function to the new function.
Now let’s add a variable local to the new file. This can be done in the same way as a function. To show it’s a variable, choose an appropriate color for it by clicking Fill Color in the Properties View as is shown.
After drawing a color, change the arrow to something to make it clear how it’s being referenced. Click the Line Type in the Properties View as is shown. Use a convention that is clear to you.
You can also annotate the arrow to make it clearer what is being done. This is done by clicking the text item in the Properties View at the bottom right. Clicking the dots on the right of the text item will allow you to make a multiline annotation.
The key is just to document what is important to you and in a way that makes the best sense to you. Don’t get too carried away, the activity can be a lot of fun and trying to document everything could take forever, and it will not be much of an overview as was the goal. As is the case with a map, you just want enough detail to help you find your way around when you later need to start modifying the code.
You are not limited to using Group Nodes as I have shown. There is a wide range of other choices, and you can even import or create your own. However, the nice thing about using group nodes is that they can nest other group nodes recursively. For example, a function could also contain a group node describing a case statement, and so forth. You can even paste in lines of source code (and make the font size very small so it fits), zoom in to read, etc. – it’s very flexible.
Although diagramming has its limitations with respect to being able to easily describe software (one reason is its two-dimensional nature – something software is not constrained to, another is that abstract concepts often don’t have physical representations), it is an often useful tool for conveying certain ideas and aspects of code such as inter-function relationships, and variable manipulations. As Martin Fowler said in his blog (also mentioned above), automatically generated diagrams of complicated systems are usually useless as they are often at the wrong level of abstraction and contain too much unnecessary detail, most of it, you are not interested in, or should be getting directly from the source code.
Perhaps the best solution is to have self-documenting code together with having used tools such as Doxygen properly. However in most cases, when working with unfamiliar code, we do not have that luxury. Perhaps a semi-automated solution exists that helps you by making these activities easier. Have you found any tools or techniques for more quickly and painlessly helping you to build a map of the structure and hidden features in unknown source code? Are there any automated or semi-automated silver-bullet solutions out there? Or have you come to find that the best solution is to trust your own ability with a combination of a good diagramming tool and a good IDE as I have demonstrated above? If you know of any even better tools and techniques, I would be highly interested in hearing your recommendations.