Chapter 1. Reading Code
Despite the way coding is taught, developers spend far more time reading code than writing it. In most beginner coding courses, you jump immediately into writing code, focusing on core language concepts and idioms without acknowledging that you’d never learn Polish or Portuguese in a similar manner. And while most academic projects start from a blank slate, practicing developers are almost always working within the confines of code that has taken years to arrive at its current state. While it may not be your first choice, you will work with code you did not write. Take heart, there are techniques to help you orient yourself in a new codebase.
Working With Existing Code
“Some of the most valuable experience I gained was from supporting a legacy app. I highly recommend it but I wouldn’t wish it on anyone.” —Dalia Shea
Software engineering is an amazing career; you can work in a variety of industries solving challenging problems by building creative solutions. But like any job, there are parts you aren’t so fond of. The obvious targets like too many meetings and impossible deadlines will likely top your list, but odds are reading existing code will crack the top ten.
Whether you learned to code in a bootcamp or on a college campus, you probably never had a course (or even a lecture) about how to effectively read code. Instead you spent much of your educational time in the blissful space known as “greenfield development” unencumbered by the baggage of the past. Yet you’ve likely had vanishingly few opportunities to build an application from a blank editor in your professional life. As a practicing software engineer, you quickly come to understand much of your time will be spent on “brownfield development” aka, working within the limits of an existing codebase.
Despite how it is often taught, programming is first and foremost a communication activity—and not between the coder and the compiler. Don’t forget, the computer understands any code (at least if it’s syntactically correct), but that doesn’t mean a human will follow what you’re trying to accomplish. The best software engineers focus on the person reading the code. Good writers (of any kind) always keep the audience front of mind.
It isn’t surprising that many developers loathe reading other programmers’ code. When you encounter existing code, you actually have four problems to solve:
-
First and foremost, you need to understand the domain, the problem you are trying to solve. And the domains developers work in are very demanding! Software is eating the world, meaning software engineers are tasked with increasingly more challenging business problems; much of the proverbial low hanging fruit has already been picked. But that is only one part of the problem when dealing with existing code.
-
Second, you must see the problem through the eyes of the developer who came before you, and that is often the most challenging aspect of software development.
-
Third, it’s possible the code isn’t the right abstraction. Perhaps it is modeled in a way that is too generic or fails to capture the proper nuance of the domain.
-
Fourth, much like an archaeologist, you are often peeling back layer after layer of technical debt, old technologies and approaches that may now be considered anti-patterns. Over time, languages and best practices evolve1 and you will have to see the code through the lens of when it was written. You may even be able to “carbon date” the code simply by noticing what frameworks or language features are (or are not) used!
It also doesn’t help that you are almost always dealing with patches on top of patches. Maybe the last developer didn’t have a full understanding of the problem or they weren’t up to speed on some new language feature that could greatly simplify the job at hand. Add in the typical demands of fix-it-fast, and you could spend an afternoon deciphering a single method.
Cognitive Biases
Of course you don’t write bad code, do you? On more than one occasion, we, your humble authors have struggled with some code, uttering less polite variations of “what idiot wrote this” only to discover that it was actually written by none other than ourselves. And frankly, if you read code you wrote a few years ago, you should be a little disappointed—that’s a sign of growth; you know more today than you did then. That is a good thing!
You also have a couple of cognitive biases working against you when you work with existing code. First is the IKEA effect which says in a nutshell, you place a higher value on things you create. One study found people would pay 63% more for a product they successfully assembled themselves versus the identical product put together by someone else. There are actually several examples of companies profiting off the IKEA effect! If you’ve ever picked your own strawberries or apples, you are often paying a premium to, well, do some of the work yourself.
Additionally there is the mere exposure effect: you tend to prefer the things you are already familiar with. Which leads to the typical dogmatism many developers have around programming languages. Developers tend to think time began with whatever language they learned first. When Java first introduced Lambda expressions, someone on a language-specific mailing list asked why Java needed these “new fangled Lambdas” not realizing Lambdas are not a new concept in programming languages and were part of the original plan for Java itself!
Developers can be very provincial around their preferred tools, which is something Paul Graham touches on in his essay Beating the Averages. Graham says programming languages exist on a power continuum, but you often can’t recognize why a language is more powerful than another. To demonstrate his point he introduces the hypothetical Blub language and a very productive Blub programmer. When the Blub programmer looks down the power continuum, all they see are languages that lack features they use everyday, and they can’t understand why anyone would choose such an inferior tool. When they look up the power continuum all they see are a bunch of weird features they don’t have in Blub, and they can’t imagine why anyone would need those to be productive since they aren’t in Blub and they are a very good Blub programmer!
As you work with code as well as your fellow developers, keep these biases in mind. If you aren’t sure why a colleague is so adamant about a certain tool or approach, ask if it might be an instance of the IKEA effect or the Blub paradox. Of course you should also reflect on your own assumptions to ensure you aren’t exhibiting one of the predispositions yourself.
Approaching a New Codebase
As much as you may wish you could spend all your work hours focussed on crafting new code, you will encounter existing code bases throughout your career. How can you get up to speed on a new project without losing your mind? First, start with your teammates. A basic project overview should be part of any onboarding experience.
Don’t be afraid to spend some time with the documentation. Many projects have a README file that will help you get your bearings while others have wikis or websites designed to give you a concise overview2. You could learn more in a few minutes with the docs than hours with the debugger. Reading the project’s coding standards will prepare you for the patterns you will encounter as you wade through the code base. Architectural decision records (ADRs) form an architectural decision log over the life of a project providing invaluable context and the all important “why” that often vanish in the rush of the latest defect or outage.
If your project’s documentation is out of date, update it as you learn; if it is nonexistent, consider building your own as you go. Creating the documentation will help you learn the project and it will also serve the developers that come after you.
With the Golden Rule in mind, write your documentation for those who will come after you. Favor light weight, low ceremony approaches seeking to answer common questions such as:
-
What does your service do?
-
How does it work?
-
What does it depend on?
-
How do you run the application?
Wait, we can hear you now: documentation may be (and often is) out of synch with the code. But believe us—that doesn’t actually have to be the case. Documentation can evolve with the code. The best way to ensure that it does is to use tests as documentation. Tests written with behavioral driven styles, if written properly, can produce executable documentation, a topic discussed at greater length in Chapter X.
Metrics Can Mislead
Code coverage (how much of the code base is executed when the tests are run) can be a very useful metric on a project. However, there are no silver bullets in software, and it is possible to fail even with 100% code coverage. A friend of ours joined a project that was having regressions with every release. As he was getting up to speed on the code he asked the tech lead if there were any tests. The tech lead very proudly said, “yes, we have right around 92% code coverage.” Very impressed, our friend was somewhat surprised they had so many regressions but he continued his analysis.
Looking at the test code he found some startling patterns. At first he thought these were isolated, but eventually he discovered they were endemic to the code base. He went back to the tech lead and said, “I couldn’t help but notice your tests don’t have any asserts.” The tech lead responded by repeating the code coverage statistic.
The meta lesson is: be wary of any metric, because they can mislead. But don’t lose sight of the value and purpose of a practice. If it is just about ceremony, you are unlikely to get the benefit you expect. Project teams should regularly challenge themselves and their approach; don’t be afraid to change course when warranted.
A Sample Process
Over time, you will develop a feel for working with existing code. However, it can help to have a process. Christopher Judd, CTO and Java Champion, teaches the following method in his bootcamps.
-
Clone the project from the source code management system
-
Review the README, coding standards, architecture & other supporting documentation
-
Take notes (architecture, build commands, common SQL, industry standards, terms, etc.) as you go. Don’t be afraid to share these notes with your teammates!
-
Review build scripts
-
Review the project dependencies
-
Review the project structure (packages, namespaces, modules, artifacts, etc)
-
Review the CI/CD pipelines
-
Install any project dependencies (build tools, runtimes, languages)
-
From your IDE
-
Run & debug application
-
Add breakpoints to interesting methods connecting them back to the running application
-
Run & debug unit tests
-
Add breakpoints to interesting methods connecting them back to the running application
-
-
From to the command-line
-
Build the project artifacts
-
Run the unit tests
-
Start any required containers (such as the datastore)
-
Run application locally (ports, urls, etc)
-
Feel free to modify these steps as you see fit.
Software Archeology
Once you’ve surveyed the team and familiarized yourself with any existing documentation, it is time to open your editor of choice and practice some software archeology. Roll up your sleeves and root around in the code base! To paraphrase Sir Issac Newton, look for smoother pebbles and prettier shells. Look at the code structure—how is the code organized? Some languages have first class constructs for packaging code, others rely on conventions. How does the code fit together? Is this a monolith or a distributed architecture with dozens or hundreds of services? What domain concepts are expressed in the code? Read the tests—what do they tell you about the functionality? With that information, do you understand the intent of this class?
If the intent isn’t clear, dig further. Modern editors can make it trivially simple to see who calls a given function allowing you to work your way backwards. Callers should help you determine what a given class does and how it is used. Your backtracking may take you all the way to a service endpoint like an HTTP call but eventually you should find the connection between a given user action and the code.
Once you have your bearings, run the application. What does it do? Find a specific element, be it something on a user interface or a parameter to a service call, and map that back to the code. Hunt for a landmark; if you know a given action results in an update to the datastore, find that in the code. Use your debugger to walk though the code—did it work the way you anticipated? Did you end up on a vastly different code path? Ultimately, you are building a mental model of the code, you are loading the application into your brain.
What this might look like in practice? Let’s use the Spring PetClinic app as an example, even if you aren’t an expert in Java or Spring, it should be fairly straightforward to navigate plus it has excellent documentation. Once you’ve cloned and run the application you see there is the ability to search for owners (see figure 2-1). If you explore the owner templates, there is one helpfully named4 findOwners.html
which references an /owners
endpoint. Searching the project for /owners returns ample results but a little intuition might lead you to the @GetMapping("/owners")
annotation on the processFindForm
method in the OwnerController
file.
Put in a breakpoint, execute the search from the browser and see what happens! Sure enough, your debugger should look something like figure 2-2. If your intuition was wrong? Repeat the previous steps, eventually you will make the connection allowing you to walk your way through the code building your understanding as you go.
Note
“The goal of software design is to create chunks or slices that fit into a human mind. The software keeps growing but the human mind maxes out, so we have to keep chunking and slicing differently if we want to keep making changes.” —Kent Beck
Use your editor to navigate the code. Many editors make it very easy to jump to methods in other classes (see Figure 1-3) as you work your way through the code. Consider collapsing all the method bodies to give you a smaller surface area to peruse (see Figure 1-4). Alternatively, many IDEs can show you an outline of a given class providing you with a higher level view of the code. Read the method names. What does that tell you about the purpose of the module?
Do not assume the code does what the name implies. Naming things is hard and, as code evolves, variable and method names may no longer reflect reality. Don’t rush, confirm your hunches. It is tempting to cut corners, but take your time.5
Exceptions can mislead. More than once we have encountered exceptions that made incorrect assumptions about possible error conditions. Pay extra care to situations that catch very high level exceptions, while expedient for the author, they tend to obfuscate the possible problems.
Your IDE may also include tools or have plugins that help you analyze the code. For example, ItelliJ IDEA can quickly show you dependencies giving you a sense of how the code works together (see figure 2-5). Modern developer tools are powerful, let them help you understand the code.
Use your source code management tool as well (see Figure 1-6). Many modern tools allow you to quickly move about your project. Look at the change history of the files. What changes frequently? What do the commit logs tell you about the updates? Start with the most frequently modified classes, something git can show you with a command like this:
git log -pretty=format: --since="1 year ago" --name-only - "*.java" | sort | uniq -c | sort -rg | head -10
You can also use tools like git blame to visualize modifications to the code. Who on your team made the most recent modification or the most frequent changes? Your IDE can also show you the change history if you don’t feel like using the command line. However you choose to investigate the code, don’t be afraid to reach out to your teammates with questions!
While purpose-built project models often fall out of synch with the code as the application evolves, you can always extract diagrams from the code base. Some IDEs will do this with a simple key combination6 (see figure 2-7 for example) but you can also use tools like Umbrello, Doxygen or Structurizr to create a visual representation of the code. Consider adding a step to your build pipeline that automatically generates fresh diagrams whenever code is committed.
Practice Makes Perfect
At the end of the day, practice some grace with yourself. Modern code bases are often sprawling. One person cannot understand it in its entirety, and that isn’t the goal. Your knowledge will grow over time. Rinse and repeat the process as you encounter new parts of your project. It can be intimidating, but every developer has gone through it. You will be fine!
How do you improve your code reading skills? As much as you may dread it, practice reading the code. There are so many well written, publicly available, open source options in a variety of languages for you to choose from. There aren’t any shortcuts; you cannot improve without practice. It does get easier over time, and you will get faster.
Wrapping Up
Arguably, coding is taught backwards: you learn to write before you learn to read, and yet you will spend a significant amount of your career reading code written by someone else. While you may not enjoy existing code as much as greenfield development, it comes with the paycheck. Rather than run from the situation, learn to embrace it; there is much to gain professionally. Be aware of cognitive biases. Don’t be afraid to roll up your sleeves and root around in an unfamiliar codebase—you will learn something. As your understanding grows, leave the code better than you found it, easing the path of the next developer...which just might be you!
Put it Into Practice
If you want to get better at reading code, there are no shortcuts, you need to read more code. Luckily, you have a veritable plethora of open source projects at your disposal! Block out a couple of hours to read some of the code in the framework you use (or the one you wish you used) at work or some other project you’re interested in. If you’re not sure where to start, checkout the trending repositories on GitHub. Apply the techniques you learned in this chapter. In a couple of months, pick another part of the project you explored or try a completely different one; was it easier than the first time? Keep at it, over time your code reading skills will improve.
Working with existing code is also a skill that needs to be developed and once again, open source software gives you a massive playground to explore. Contributing to open source is an excellent learning laboratory and it isn’t nearly as hard to get started as you may think8. Pick a project and spend a few hours working with it.
1 Or as Neal Ford said: “today’s best practice is tomorrow’s antipattern”.
2 Said documentation may be out of date, trust but verify.
3 Or more commonly “Do unto others as you would have them do unto you.”
4 Of course not all applications have such cleanly named files - you may have to make ample use of your favorite search tools.
5 In other words, take the time it takes so it takes less time.
6 For example, ⌥⇧⌘U (macOS) / Ctrl+Alt+Shift+U (Windows/Linux) in InteliJ IDEA will generate a UML diagram.
7 Though your organization’s lawyers likely have strong opinions about the use of such tools, double check your corporate policies before you paste your pricing algorithm into one of them!
8 Many projects have lists of bugs marked as “for first time contributors” but don’t be afraid to reach out to the contributors; most will happily help you get up and running.
Get Fundamentals of Software Engineering now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.