Write Reproducible, Reversible, Tested ML Code With No Time Lost | by Joseph Gardi | Jun, 2022

Plus how to speed up your iteration speed

I’m not kidding. But the difference between a pro and a junior dev is using vim’s y, x, and p rather than ctrl-C. Image from: https://pin.it/7liORTR

I often see new coders saying they’re not sure if they should write the code quickly or write good code. This is an oxymoron. The whole point of good code is to save you time.

It feels like writing good code takes a long time because you’re still not sure what good code is. Therefore, it requires some pontification and experimentation. But this pays off in the long run! If you put in the work to build the proper habits, you will eventually get to the point where you can write reasonably good code without having to think about it.

The best way to learn to write good code is to write your own project with several thousand lines. You don’t start understanding the long-term consequences of your decisions until you have a single project with several thousand lines. But perhaps these tips can speed up your progress.

Test-driven development does not have to require so much discipline. There is a natural way to code that leads to:

  1. Many small automated tests
  2. Many small pure (as described in the practical functional programming book Grocking Simplicity) reusable functions.

Since I spend a lot of time writing experimental code, I seldom write unit tests just because I feel I am supposed to. I do it because:

  1. The cost of buggy experimental code is actually very high. Some coders might think they can be sloppy on their experimental code if it’s not going to production.
    The nice bugs that don’t scare me are the ones that give me an exception and crash my whole program. Then I know what the problem is and can fix it. What keeps me up at night is the bugs that cause poor accuracy without producing any error messages.
    The worst possible outcome of any experiment is a bug that causes a negative result where the idea was fundamentally good and could have worked. This will mislead you. If the idea was wrong, but the execution was good, then at least we learn what doesn’t work, and that should be appreciated.
    Publishing null results should be encouraged. Researchers not publishing their null results is one of the worst errors in our current use of the scientific method because it leads to a confirmation bias where we only see the results that match what the researchers wanted to see.
    In ML, I don’t fault people for not knowing what will work on some specific dataset ahead of time. But I do hold people at fault for producing incorrect results. Good science is not about getting the result you wanted to see. It’s about getting accurate results. I admit I am biased towards prioritizing writing reliable code because I hate debugging but enjoy writing good code. If you enjoy having bugs and don’t enjoy writing good code, this article may not be for you.
  2. Each experiment is easily reproducible because it is in its own unit test. For every result I report, I link to the corresponding unit test.
  3. It is easier to find the problem when I have a small piece of code tested in isolation.

Unit tests are painfully slow to start. It gets even slower when you load some dataset or do other setup work. This kills your productivity. I’m not concerned about the few seconds it takes to start the test. I’m concerned that it breaks my flow state. It just makes coding less fun. Jupyter is particularly helpful for visualization because you can see your plot right below the corresponding code.

The best solution to this issue is to get it working in fewer iterations. If you use type hints, that will give you instantaneous feedback as you type.

One approach is to first write your code in Jupyter and then copy it into unit tests. With the autoreload magic command, you can seamlessly keep most of your code in python files while calling it from Jupyter or the REPL. To do the autoreload magic, do:

%load_ext autoreload
%autoreload 2

Simply calling your unit tests from the REPL with autoreload gives you way faster iteration speed than restarting unit tests. In your IPython session, Jupyter notebook, or debugger console, do:

%load_ext autoreload
%autoreload 2
import sys
sys.path.insert(0, '<path to your tests directory')
import <unit test module name>
<unit test module name>.<unit test class name>.<unit test method name>()

Furthermore, I recommend using Pycharm. In Pycharm, go to Preferences | Build, Execution, Deployment | Console | Python Console. Then add the following to the existing startup script:

%load_ext autoreload
%autoreload 2
sys.path.insert(0, 'tests')

Then do View | Tool Windows | Python Console. Notice the debug button on the left side of the python console in Pycharm.

Another advantage of Jupyter is easily sharing your results. To replicate this with unit tests, you may use the export test results option in Pycharm and store the resulting file in your git repo.

However, you can’t keep your REPL running forever, so I recommend using Joblib caching and Joblib Parallel to speed things up.

For this, I recommend setting a breakpoint at the part you want to modify and running the Pycharm debugger with the python console. Then you can have a REPL with access to all the variables and the whole call stack. I try out my code in the REPL there and then copy that code into the file.

When you make one small change at a time, every bug is instantly obvious. The last change you made is the problem. But sometimes I forget how it was before. So rather than changing the code in place, copy and paste your code into a new unit test and then change the copy. This leads to a series of complex unit tests that get closer to your final product. You find the issue by comparing the new unit test to the last one that passed. You have been told not to copy and paste, but that is the most effective way to produce reusable code.

In Pycharm, your duplicated code will be underlined in yellow. If you put your cursor there and hit option-enter, you can show all duplicates like this. Then select that code, right-click on the selected code, and hit CMD-option-M to extract a method for that. You might then see a warning on the declaration of the newly generated method that it could be static. In that case, put the cursor on the declaration of the newly generated method, hit option-enter, and enter again to convert it to a function outside the class.

The process I’ve outlined here organically grows many small, fast unit tests and small reusable pure functions.

Another reason I advocate copy and paste is that flag arguments are a code smell. Unfortunately, many library developers in the python community do not understand this.

I often find that extremely simple functions from the python standard library have implementations that are hard to follow because there is a bunch of logic around supporting different flags and configurations.

I understand that some of this is necessary just because python doesn’t support overloading. In addition, many flag arguments can be avoided by copying and pasting to create separate functions for the different use cases.

Copying and pasting is especially beneficial for research and experimentation because it allows you to preserve your old experiments in case you want to revisit them. This approach makes reversibility (as described in the timeless must-read book The Pragmatic Programmer) a breeze.

Many unit tests are just testing the cases that people think of off the top of their heads, but the hardest bugs tend to come from the cases we don’t foresee. Try to test on randomly generated data instead. You will usually not have labels for what the correct output should be on this randomly generated data, but assertions can reveal many issues automatically.

Property-based testing is the approach of testing on randomly generated data in combination with assertions that check certain properties hold true. In property-based testing, we don’t check that the code gives a specific output because we often don’t know what the output should be for the generated data.

Assertions are fantastic because they run every single time and find the error before the code even finishes running.

I got some initial inspiration for this from sklearn. The project_dir variable defines a hidden directory within the home directory where I cache everything and store datasets. I mentioned before that you should speed up your tests with joblib caching. Here I define a joblib cache within the project_dir.

I used methods with the cached_property decorator rather than variables so that the ordering of initialization is figured out for you. Methods can call methods defined lower down in the file, while a variable can’t depend on a variable that is not yet defined.

Moreover, cached properties are lazily instantiated so that we only load the resources we need. I define a singleton in the global variable project_res. Here’s an example of my approach:

For my bulleted list of tips and tricks for ML engineering, see:

I also can’t recommend The Pragmatic Programmer enough. It’s such a quick read and yet so valuable.

Leave a Comment