Using Bitwise Operations to Improve Python Performance | by Ryan Louis Stevens | May, 2022

Or why a little C++ can get you a long way

Photo by Possessed Photography on Unsplash

I’ll walk through an algorithm to do addition using Python. Using bitwise operators and C++ extension modules results in nearly on-par performance with base Python addition.

I’ve recently been dealing with bytes and bits a lot. I thought I’d revisit some of the old algorithm problems I remember seeing back in the day. One classic problem is this:

Create an integer addition ‘+’ operator. Assume you can not use multiplication, division, or subtraction. For simplicity, assume inputs can only be non-negative integers.

Let’s first think of a simple naive solution to this. One way to do this would be to replicate long addition but optimize it using some simple memoization.

For long addition, you align the two numbers you want to add. Starting on the far right (first digit), you keep adding and carrying over terms if the two terms you’ve added are greater than 9.

We describe the algorithm in pseudo-code, then walk through a simple example.

Pseudo Code

We will explain more in-depth what this pseudo-code means using an example.

Lookup Addition Pseudo Code

Walking through the pseudo-code

Let’s take an example for adding the two numbers 23 and 48. Think of these two numbers as two arrays of strings:

[‘2’, ‘3’] and [‘4’, ‘8’]

We align the numbers, then go right to left. At each position, we will only be adding two single digit numbers (such as 3+8 in position 1, and 2+4 in position 0).

We will need to determine two things:

  1. Reminder term
  2. Carryover term

We discuss what they mean in this example.

Position 1

  • Start with two terms: (3,8)
  • Remainder Step: remainder(3,8) = 1 (because 3+8 = 11 mod 10 = 1)
  • Carryover Step: carryover(3,8) = 1 (because 3+8 > 9)
  • Add remainder step to output array: [1]

Position 0

  • Start with two terms: (2, 4)
  • Bring forward remainder term from prior step: (2+1, 4)
  • Remainder Step: remainder(3, 4) = 7
  • Carryover Step: carryover(3,4) = 0
  • Add remainder step to output array: [7, 1]

Final Step

  • Return concatenated array: 71

What we see is that in position 1, the remainder step gives us the digit in the output, whereas the carry-over step will increment the digits in the next position (ie position 0). We repeat this process, appending an output array using the remainder step until we have exhausted both arrays.

Python Code

Now for the python code. Unfortunately, it’s very complex. We won’t spend time refactoring this, because another solution is both more readable and faster.

Simple speed up

We can memoize the remainder and carryover steps, by defining lookup arrays: Reminder and Carryover. We won’t delve any deeper into this, but it was the main speed improvement for this algorithm.

Reminder Array

Carryover Array

Another avenue for a solution is using bitshift operators. Bit shifting is equivalent to multiplying or dividing a number by some base number. For example, in base-2 the number 4 is represented as [1,0,0]. Remember, bits are just coefficients, so 4=1*2²+0*2¹+0*2⁰.

Bit shifting 4 to the left is equivalent to multiplying 4 by the number 2, thus BitshiftLeft(4)=8=[1,0,0,0].

Pseudo Code

Below we have our pseudo-code for a bitwise-based algorithm. You notice right off the bat that bitshift addition does not rely on lookup arrays, and its pseudo-code is much much shorter. It turns out its python/c++ code is also very very short.

Walking through the pseudo-code

I’ll walk through the logic of how this works. Let’s do a simpler addition than before, assuming we want to add 5+3.

Let’s covert 5, 3, and 8 (the answer) into base-2 binary representation:

  • 5=1*2² +0*2¹+1*2⁰
  • 3=0*2² +1*2¹+1*2⁰
  • 8=1*2³+0*2²+0*2¹ +0*2⁰

What we will do is basically replicate our carry-over logic from above, but this time in binary. This allows us to do the carry-over in a “vectorized” fashion.

The approach attempts to find two-bit sets such that they have no overlapping bits whose values ​​are greater than 0. To do so, we alternate using bitshifts and bitmasks. The algorithm’s base case defines x, y as the input values ​​we’re trying to add together:

  • x=[1, 0, 1] (ie 5 in base-2)
  • y=[0,1,1] (ie 3 in base-2)

Iteration 1

Step 1: Carryover step using AND and Bitshift bitwise operator

This step simply looks for bits shared between 3 and 5, and then carries them forward in the next step. We combine AND and bitshift left to redefine x:

  • x=BitshiftLeft(And(x,y)) = [0,1,0]

Breaking these steps down:

  • And(x,y) = [0, 0, 1] (because 5 and 3 share 2⁰)
  • BitshiftLeft([0,0,1]) = [0, 1, 0] (this is equivalent to 2¹=2⁰+2⁰)

Step 2: Prevent doubling counting step using OR and Bitmask bitwise operator

In step 1, we eliminated bits that were shared between x and y. We want to remove those bits from y.

  • y=BitMask(Or(x,y), ~And(x,y))=[1,1,0]

Breaking these steps down:

  • ~And(x,y) = [1, 1, 0] (these are bits we did not eliminate in step 1, we want to keep)
  • Or(x,y) = [1, 1, 1] (these are bits we way need to mask)
  • BitMask([1, 1, 1], [1, 1, 0]) = [1, 1, 0] (mask the bits in 2⁰ position)

Stop Condition

We keep doing these iterations until x and y do not share any bits, ie we can simply add together the bit representation. In this example, this is the final values ​​of x and y:

  • Final x=[1, 0, 0, 0]
  • Final y=[0, 0, 0, 0]

We use the OR operator to get our solution:

  • 8 = Or(x, y) = [1, 0, 0, 0]

Python Code

The python code is relatively short and straightforward.

Simple Speedups — C++ Extension Module

One “easy” speed up is to code this up in C++ and create a C++ extension module. The code for the extension module is below. The actual C++ function we will call in python is defined as c_bitadd_methodthe rest of the code is to define c++ classes/methods so it can integrate with Python.

I do a simple experiment, varying two parameters:

  • Total Number of Samples: use three values: 100k, 1mil, 10mil
  • Largest Possible Number: use three values: 100k, 10mil, 1bil

These two parameters vary the total number of additions being done, as well as, the size of the integers. For example, when Largest Possible Number = 1bil this means that potentially the add function will need to add together integers as large as 1 billion.

I test the 4 algorithms:

  1. Python Base Add Operator
  2. Lookup Addition (LookupAdd)
  3. Bitshift Addition — Python (PyBitshiftAdd)
  4. Bitshift Addition — C++ (CPyBitshiftAdd)

The results are below and fairly clear:

  • Lookup Addition is really slow: this is expected
  • Bitshift Addition compiled in C++ performs very close to the Python Base Add operation, which is pleasant to see given this was the first version:

This post was a simple walk-through of using bitwise operations and C++ to create an additio function that has on-par performance with Python’s addition operators.

Want to Connect?I discuss on similar topics on my personal website.

Leave a Comment