Find Extreme Values ​​in Arrays – Real Python

You’ve now seen examples of all the basic use cases for NumPy’s max() and maximum(), plus a few related functions. Now you’ll investigate some of the more obscure optional parameters to these functions and find out when they can be useful.

Reusing Memory

When you call a function in Python, a value or object is returned. You can use that result immediately by printing it or writing it to disk, or by feeding it directly into another function as an input parameter. You can also save it to a new variable for future reference.

If you call the function in the Python REPL but don’t use it in one of those ways, then the REPL prints out the return value on the console so that you’re aware that something has been returned. All of this is standard Python stuff, and not specific to NumPy.

NumPy’s array functions are designed to handle huge inputs, and they often produce huge outputs. If you call such a function many hundreds or thousands of times, then you’ll be allocating very large amounts of memory. This can slow your program down and, in an extreme case, might even cause a memory or stack overflow.

This problem can be avoided by using the out parameter, which is available for both np.max() and np.maximum(), as well as for many other NumPy functions. The idea is to pre-allocate a suitable array to hold the function result, and keep reusing that same chunk of memory in subsequent calls.

You can revisit the temperature problem to create an example of using the out parameter with the np.max() function. You’ll also use the dtype parameter to control the type of the returned array:

>>>

>>> temperature_buffer = np.empty(7, dtype=np.float32)
>>> temperature_buffer.shape
(7,)

>>> np.maximum(temperatures_week_1, temperatures_week_2, out=temperature_buffer)
array([ 7.3,  7.9,  nan,  8.1,  nan,  nan, 10.2], dtype=float32)

The initial values ​​in temperature_buffer don’t matter, since they’ll be overwritten. But the array’s shape is important in that it must match the output shape. The result displayed looks like the output that you received from the original np.maximum() example. So what’s changed? The difference is that you now have the same data stored in temperature_buffer:

>>>

>>> temperature_buffer
array([ 7.3,  7.9,  nan,  8.1,  nan,  nan, 10.2], dtype=float32)

The np.maximum() return value has been stored in the temperature_buffer variable, which you previously created with the right shape to accept that return value. Since you also specified dtype=np.float32 when you declared this buffer, NumPy will do its best to convert the output data to that type.

Remember to use the buffer contents before they’re overwritten by the next call to this function.

Filtering Arrays

Another parameter that’s occasionally useful is where. This applies a filter to the input array or arrays, so that only those values ​​for which the where condition is True will be included in the comparison. The other values ​​will be ignored, and the corresponding elements of the output array will be left unaltered. In most cases, this will leave them holding arbitrary values.

For the sake of the example, suppose you’ve decided, for whatever reason, to ignore all scores less than 60 for calculating the per-student maximum values ​​in Professor Newton’s class. Your first attempt might go like this:

>>>

>>> n_scores
array([[63, 72, 75, 51, 83],
       [44, 53, 57, 56, 48],
       [71, 77, 82, 91, 76],
       [67, 56, 82, 33, 74],
       [64, 76, 72, 63, 76],
       [47, 56, 49, 53, 42],
       [91, 93, 90, 88, 96],
       [61, 56, 77, 74, 74]])

>>> n_scores.max(axis=1, where=(n_scores >= 60))
ValueError: reduction operation 'maximum' does not have an identity,
            so to use a where mask one has to specify 'initial'

The problem here is that NumPy doesn’t know what to do with the students in rows 1 and 5who didn’t achieve a single test score of 60 or better. The solution is to provide an initial parameter:

>>>

>>> n_scores.max(axis=1, where=(n_scores >= 60), initial=60)
array([83, 60, 91, 82, 76, 60, 96, 77])

With the two new parameters, where and initial, n_scores.max() considers only the elements greater than or equal to 60. For the rows where there is no such element, it returns the initial value of 60 instead. So the lucky students at indices 1 and 5 got their best score boosted to 60 by this operation! The original n_scores array is untouched.

Comparing Differently Shaped Arrays With Broadcasting

You’ve learned how to use np.maximum() to compare arrays with identical shapes. But it turns out that this function, along with many others in the NumPy library, is much more versatile than that. NumPy has a concept called broadcasting that provides a very useful extension to the behavior of most functions involving two arrays, including np.maximum().

Whenever you call a NumPy function that operates on two arrays, A and Bit checks their .shape properties to see if they’re compatible. If they have exactly the same .shapethen NumPy just matches the arrays element by element, pairing up the element at A[i, j] with the element at B[i, j]. np.maximum() works like this too.

Broadcasting enables NumPy to operate on two arrays with different shapes, provided there’s still a sensible way to match up pairs of elements. The simplest example of this is to broadcast a single element over an entire array. You’ll explore broadcasting by continuing the example of Professor Newton and his linear algebra class. Suppose he asks you to ensure that none of his students receives a score below 75. Here’s how you might do it:

>>>

>>> np.maximum(n_scores, 75)
array([[75, 75, 75, 75, 83],
       [75, 75, 75, 75, 75],
       [75, 77, 82, 91, 76],
       [75, 75, 82, 75, 75],
       [75, 76, 75, 75, 76],
       [75, 75, 75, 75, 75],
       [91, 93, 90, 88, 96],
       [75, 75, 77, 75, 75]])

You’ve applied the np.maximum() function to two arguments: n_scoreswhose .shape is (8, 5), and the single scalar parameter 75. You can think of this second parameter as a 1 x 1 array that’ll be stretched inside the function to cover eight rows and five columns. The stretched array can then be compared to element by element with n_scoresand the pairwise maximum can be returned for each element of the result.

The result is the same as if you had compared n_scores with an array of its own shape, (8, 5), but with the value 75 in each element. This stretching is just conceptual—NumPy is smart enough to do all this without actually creating the stretched array. So you get the notational convenience of this example without compromising efficiency.

You can do much more with broadcasting. Professor Leibniz has noticed Newton’s skulduggery with his best_n_scores array, and decides to engage in a little data manipulation of her own.

Leibniz’s plan is to artificially boost all her students’ scores to be at least equal to the average score for a particular test. This will have the effect of increasing all the below-average scores—and thus producing some quite misleading results! How can you help the professor achieve her somewhat nefarious ends?

Your first step is to use the array’s .mean() method to create a one-dimensional array of means per test. Then you can use np.maximum() and broadcast this array over the entire l_scores matrix:

>>>

>>> mean_l_scores = l_scores.mean(axis=0, dtype=np.integer)
>>> mean_l_scores
array([79, 68, 71, 69, 64])

>>> np.maximum(mean_l_scores, l_scores)
array([[87, 73, 71, 69, 67],
       [79, 68, 82, 80, 64],
       [92, 85, 71, 79, 77],
       [79, 79, 71, 69, 87],
       [86, 91, 92, 73, 64],
       [79, 68, 71, 79, 64],
       [83, 68, 71, 69, 64],
       [89, 68, 72, 69, 64]])

The broadcasting happens in the highlighted function call. The one-dimensional mean_l_scores array has been conceptually stretched to match the two-dimensional l_scores array. The output array has the same .shape as the larger of the two input arrays, l_scores.

Following Broadcasting Rules

So, what are the rules for broadcasting? A great many NumPy functions accept two array arguments. np.maximum() is just one of these. Arrays that can be used together in such functions are termed compatibleand their compatibility depends on the number and size of their dimensions—that is, on their .shape.

The simplest case occurs if the two arrays, say A and B, have identical shapes. Each element in A is matched, for the function’s purposes, to the element at the same index address in B.

Broadcasting rules get more interesting when A and B have different shapes. The elements of compatible arrays must somehow be unambiguously paired together so that each element of the larger array can interact with an element of the smaller array. The output array will have the .shape of the larger of the two input arrays. So compatible arrays must follow these rules:

  1. If one array has fewer dimensions than the other, only the trailing dimensions are matched for compatibility. The trailing dimensions are those that are present in the .shape of both arrays, counting from the right. So if A.shape is (99, 99, 2, 3) and B.shape is (2, 3)then A and B are compatible because (2, 3) are the trailing dimensions of each. You can completely ignore the two leftmost dimensions of A.

  2. Even if the trailing dimensions aren’t equal, the arrays are still compatible if one of those dimensions is equal to 1 in either array. So if A.shape is (99, 99, 2, 3) as before and B.shape is (1, 99, 1, 3) or (1, 3) or (1, 2, 1) or (1, 1)then B is still compatible with A in each case.

You can get a feel for the broadcasting rules by playing around in the Python REPL. You’ll be creating some toy arrays to illustrate how broadcasting works and how the output array is generated:

>>>

>>> A = np.arange(24).reshape(2, 3, 4)
>>> A
array([[[ 0,  1,  2,  3], [ 4,  5,  6,  7], [ 8,  9, 10, 11]],
       [[12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]]])

>>> A.shape
(2, 3, 4)

>>> B = np.array(
...     [
...         [[-7, 11, 10,  2], [-6,  7, -2, 14], [ 7,  4,  4, -1]],
...         [[18,  5, 22,  7], [25,  8, 15, 24], [31, 15, 19, 24]],
...     ]
... )

>>> B.shape
(2, 3, 4)

>>> np.maximum(A, B)
array([[[ 0, 11, 10,  3], [ 4,  7,  6, 14], [ 8,  9, 10, 11]],
       [[18, 13, 22, 15], [25, 17, 18, 24], [31, 21, 22, 24]]])

There’s nothing really new to see here yet. You’ve created two arrays of identical .shape and applied the np.maximum() operation to them. Notice that the handy .reshape() method lets you build arrays of any shape. You can verify that the result is the element-by-element maximum of the two inputs.

The fun starts when you experiment with comparing two arrays of different shapes. Try slicing B to make a new array, C:

>>>

>>> C = B[:, :1, :]
>>> C
array([[[-7, 11, 10,  2]],
       [[18,  5, 22,  7]]])

>>> C.shape
(2, 1, 4)

>>> np.maximum(A, C)
array([[[ 0, 11, 10,  3], [ 4, 11, 10,  7], [ 8, 11, 10, 11]],
       [[18, 13, 22, 15], [18, 17, 22, 19], [20, 21, 22, 23]]]))

The two arrays, A and Care compatible because the new array’s second dimension is 1, and the other dimensions match. Notice that the .shape of the result of the maximum() operation is the same as A.shape. That’s because Cthe smaller array, is being broadcast over A. The result of a broadcast operation between arrays will always have the .shape of the larger array.

Now you can try an even more radical slicing of B:

>>>

>>> D = B[:, :1, :1]
>>> D
array([[[-7]],[[18]]])

>>> D.shape
(2, 1, 1)

>>> np.maximum(A, D)
array([[[ 0,  1,  2,  3], [ 4,  5,  6,  7], [ 8,  9, 10, 11]],
       [[18, 18, 18, 18], [18, 18, 18, 19], [20, 21, 22, 23]]])

Once again, the trailing dimensions of A and D are all either equal or 1, so the arrays are compatible and the broadcast works. The result has the same .shape as A.

Perhaps the most extreme type of broadcasting occurs when one of the array parameters is passed as a scalar:

>>>

>>> np.maximum(A, 10)
array([[[10, 10, 10, 10], [10, 10, 10, 10], [10, 10, 10, 11]],
       [[12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]]])

NumPy automatically converts the second parameter, 10to an array([10]) with .shape (1,)determines that this converted parameter is compatible with the first, and duly broadcasts it over the entire 2 x 3 x 4 array A.

Finally, here’s a case where broadcasting fails:

>>>

>>> E = B[:, 1:, :]
>>> E
array([[[-6,  7, -2, 14], [ 7,  4,  4, -1]],
       [[25,  8, 15, 24], [31, 15, 19, 24]]])

>>> E.shape
(2, 2, 4)

>>> np.maximum(A, E)
Traceback (most recent call last):
...
ValueError: operands could not be broadcast together with shapes (2,3,4) (2,2,4)

If you refer back to the broadcasting rules above, you’ll see the problem: the second dimensions of A and E don’t match, and neither is equal to 1so the two arrays are incompatible.

You can read more about broadcasting in Look Ma, No For-Loops: Array Programming With NumPy. There’s also a good description of the rules in the NumPy docs.

The broadcasting rules can be confusing, so it’s a good idea to play around with some toy arrays until you get a feel for how it works!

Leave a Comment