Why Your PRs Aren’t Improving Quality | by Charles Chen | Apr, 2022

It’s simple: you’re missing the forest for the tree

Via Pixabay

The ubiquity of Git in the world of software engineering means that the tooling and processes around it have somewhat coalesced more or less into a few handful of patterns with respect to developer workflows and software release patterns.

These include:

At the core of each of these is the pull request or PR and many a team attempts to enforce quality — of both code and product — at this layer.

Guess what? It rarely works.

It should be obvious why this approach rarely works without much more rigor and discipline: at the point of the PR, the code is already written and all who participate are subject to the sunk cost fallacy.

In other words, at the point of the PR, it is unlikely that a team lead or peer would point out that an approach is sub-optimal or should be scrapped or request major changes. Not just because of the sunk cost fallacy, but the PR review itself is likely a big context switch for the reviewer as well.

Instead, most PRs will end up focusing on:

  • Obvious isolated logic errors
  • Bad practices and style (but lint rules can be used for this)
  • Unsafe practices (again lint rules, static analysis tools, or tools like CodeQL can be used instead)
  • Violations of DRY when a reviewer knows that such a piece of logic may exist elsewhere in the system
  • Or — the worst — just routine motions.

Such exercises ultimately feel like a chore and rarely lead to the type of code or product quality improvements that will make a difference in terms of lower defect rates, iteration velocity, better architecture, and increased developer productivity.

More often than not, it feels as if such exercises are the ceremony and miss the forest for the trees.

If this feels all to familiar, read on!

There is often a mistaken equivalency that PRs are code reviews. While, indeed, you review code in the context of a PR, the PR is not a code review.

For starters, it is often useless to review changes in code outside of the context of the larger flow of the application and a deeper review of the product, feature, or business requirements. In other words, deciding how the code should be written requires knowing the business or product context in which the code was written.

But even at a very fundamental level, it is often impossible to understand whether a 1-line diff in isolation is correct or misses edge cases or requires more test cases without understanding the larger context of the code around it and how the data flows through that code.

I posit that no amount of trying to make the PR easier for the reviewer with smaller PRs, PR templates, more frequent PRs, draft PRs, and so on will meaningfully impact product and code quality.

The reason is simple: code is dense and many features and bug fixes are complex thus requiring a deep understanding of the flow of logic and business requirements and it is likely impossible for a peer working on some other sub-system — or event the same sub -system — to know whether your code is correct, completeand high quality.

And here, we get to the real core of the question: how exactly do we determine code and product quality?

To answer this question, it’s instructive to consider how nearly every other industry achieves quality and it is almost universally two levers that are used:

It’s as simple as asking the question: did the output meet the specified product design? Whether it’s a food manufacturing line, a machine assembly line, pharmaceutical manufacturing, construction, a seamstress sewing a wedding dress, or software engineering, it is clear that meaningful quality can only be achieved through specifications and testing.

These are two sides of the same coin. Without specifications, testing is meaningless. Without testing, specifications cannot be verified. The specifications determine the testing strategy and the testing verifies that the specifications have been implemented correctly.

Likewise, code reviews without specifications are empty exercises because it is impossible to determine the correctness or suitability of an implementation without the business context in which it was nearly designed and developed (if it was designed at all!). Without the specifications, code reviews focus on the wrong things.

So how can we meaningfully improve code and product quality? How do we have more meaningful code reviews rather than PRs that just go through the motion?

Fix the Specification Process

In the age of agile, no one wants to hear this, but the root of many code and product quality issues comes from a broken (or non-existent) product specification process. Often, this is a result of a lazy or inexperienced product team that lacks the rigor to fully explore and document a feature or a product.

This often manifests as “we’re agile so let’s build a prototype and we’ll review it and iterate”. More often than not, the sunk cost fallacy creeps in once again and the prototype becomes the shipped product.

Without sufficient specifications, deciding on the most suitable architecture or defining the test procedure is an impossible task because “edge cases” will abound. Without a strong specification to start from, it is often impossible to determine the correct design pattern to apply to solve the problem at hand because the scope of the problem has not been fully revealed to the implementation team.

Personally, I am a proponent of Basecamp’s Shape Up model. I’ve used it. I know it works.

This model requires that the product function within an organization accept more responsibility and rigor up front to fully encapsulate the parameters of work so that an engineering team can work from a full context. Conversely, it also requires the engineering function to carefully review the specifications and make determinations of feasibility, tradeoffs, and timelines.

Having specifications can make code reviews much more meaningful because rather than reviewing the code without context, the objective in a code review then becomes one to determine whether the implementation meets the specifications.

Require and Review Technical Designs

Rather than reviewing code once it’s been written and cut, it would seem that the software engineering discipline could learn a lot by understanding how other industries manage matters of output quality: review the design.

Imagine building a house or skyscraper without a blueprint. Imagine building a Tesla without a CAD model of every component. Imagine building a SpaceX rocket without rigorous design and specification of each part that has to withstand incredible forces. It would seem an impossibility.

Yet we treat software as if it’s a given that we simply don’t give a damn about technical design and architecture.

Agile in the software engineering discipline is often applied to the code, but rarely to the design. Wouldn’t it be much cheaper to iterate the design before spending the time to fix broken, buggy code later?

The rigor of the design is specific to the domain, of course. Fast Company had a great article They Write the Right Stuff about software engineering within NASA:

But how much work the software does is not what makes it remarkable. What makes it remarkable is how well the software works. This software never crashes. It never needs to be re-booted. This software is bug-free. It is perfect, as perfect as human beings have achieved.

When the code is part of a one shot, once in a generation mission, such systems must strive for near perfection.

For most teams, this is obviously too much rigor, but that does not mean “no design”.

Write Better Code

Ironically, the answer to improving code quality is almost certainly “write better code“. The best way to do that isn’t PRs nor code reviews because at that point the code is already written (this is the fundamental logical flaw in thinking with respect to code reviews and code quality) and teams will be hesitant to say “Let’s refactor this into cleaner code and better design” (will never happen) because at that point, the pressure to ship is too high. In fact, that code will never be rewritten (every product has that pile of code that no one dares touch).

This is no easy feat and requires more more diligence in the hiring process and structuring of teams. Rather than a free-for-all division of work, it often means that core framework bits and APIs should be handled by a smaller group of minds that have more experience.

To be fair, no team ever gets it right on the first cut; that’s not the point. The point is to build a system in such a way that its core is pliable and malleable to the shifting business requirements.

More Team-Oriented Development

Developers working together on a bug fix or a feature in a team will have a better grasp of the larger context of the business requirements and code flow in that area of ​​the system and are therefore better equipped to provide more meaningful feedback to each other compared to pulling in a reviewer who has no contextual understanding of the specific business problem a piece of code is solving.

Organizing development work into small teams is a way to automatically put more eyes on each unit of work and ensure that there are multiple individuals who understand the larger context of the feature being implemented.

Interactive Code Reviews

Rather than looking at diffs in isolation, having interactive code reviews where the submitter walks through the code in the larger context of the product or feature and discusses the tradeoffs which were made can lead to much more meaningful feedback from a reviewer.

Often there are nuances to the decision making process that are not visible when looking at a diff or even a change set because the code is interacting within the flow of a larger system or state machine.

Having a developer or team walk through their technical design decisions and challenges they accounted for during their implementation can be far more productive than just reviewing a PR.

Focus on Testing in Code Reviews

In addition to looking at the details of the implementation, it is perhaps even more important to review the testing strategy to verify that it provides sufficient coverage of the requirements rather than coverage of the code.

No amount of code reviews will account for the edge cases because by their nature, These are cases that were not in a specification. Similarly, to evaluate whether the testing — whether automated unit tests or automated end-to-end tests — is sufficient in a code review that requires that there is a sufficiently detailed functional specification from which to determine the test coverage.

Code reviews should focus on whether the developer has designed and implemented the proper test cases to account for the specifications. But of course, this requires that such specifications exist in the first place from which to determine the coverage of such test cases.

Leave a Comment