Handling Large Arrays in Golang: Should You Use For Range or For Loop? | by Dwen | May, 2022

Golang loop control method analysis

Photo by Nathan Dumlao on Unsplash

Go is all about “one way to do one thing”! For example, Go keeps only one type of loop control statement, which is the classic version of for loop.

And loop control statements such as while, do…whilesupported by C language are excluded from Go’s concise syntax.

But in order to facilitate Go developers to iterate on composite data types, such as arrays, slices, channels, and maps, Go provides a variant for range loopwhich can even traverse maps and channels.

However, while range brings convenience, it also brings some troubles to Go beginners. For example, when for range iterates compound type variables, there are some common and very easy traps to fall into.

For example: participating in the loop is a copy of the range expression.

Let’s look at a code block.

Can you guess what the code will output? Do you think this code will output the following result.

However, the actual output of running the program is:

We thought that during the first iteration, i = 0our modification to a ( a[1] = 12, a[2] = 13) would be replaced by v in the second and third iterations Take out but from the result, v takes out the value of a before being modified: 2 and 3.

Why is this the case? The reason is that participating in the for range loop is a copy of the range expression. That is, in the above example, it is the copy of a that is actually participating in the loop, not the real a.

To make it easier for you to understand, let’s rewrite the for range loop in the above example in an equivalent pseudocode form:

Now the truth is finally revealed: In this example, each iteration copies the elements of acfrom the value of array a.

ac is a contiguous sequence of bytes temporarily assigned by Go, and is not a memory area at all with a.

Therefore, no matter how ais modified, the copy acthat it participates in the cycle still maintains the original value, so what v takes out of acis still the original value of anot the modified value.

A question, for large arrays, since the for range is a copy of the array, will use for range consume more resources and have worse performance than the classic for loop?

Let’s verify it with a benchmark example: for large arrays, does for range necessarily run slower than the classic for loop? Let’s look at the first example:

In this example, we use for loop and for range to traverse an array with 10w elements of type int respectively. Let’s take a look at the results of the benchmark:

$ go test -bench . benchmark1_test.go
BenchmarkClassicForLoopIntArray-8 22080 55124 ns/op 0 B/op 0 allocs/op
BenchmarkForRangeIntArray-8 34808 34433 ns/op 0 B/op 0 allocs/op
ok command-line-arguments 3.321s

From the output results, we can see that the for range loop is not only not affected by the large array copy operation, but its performance is actually better than that of the for loop.

This is obviously the result of optimization at the compiler level (usually static single assignment, or SSA link).

Let’s turn off the optimization switch and run the stress test again:

$ go test -c -gcflags '-N -l' .  
$./demo.test -test.bench .
BenchmarkClassicForLoopIntArray-8 6248 187773 ns/op 0 B/op 0 allocs/op
BenchmarkForRangeIntArray-8 4768 246512 ns/op 0 B/op 0 allocs/op

We see: that without optimization, the performance of both loops drops significantly, the for range drops even more, and the performance is significantly worse than the classic for loop.

You can compare the assembly code snippet of the BenchmarkForRangeIntArray function with normal optimization (go tool compile -S xxx.go) and with optimization turned off (go tool compile -S -N -l).

You will find that with optimization turned off, the assembly code uses a lot of intermediate variables to store intermediate results, and the optimized codes these intermediate states.

Then you may ask the following question: Is it possible that for range iterating over large arrays of any element type is no worse than a classic for loop? Let’s look at an example of traversing an array of structures:

In this example, we define 5 types of structures: U1~U5the difference between them lies in the number of int-type fields they contain.

Let’s use the classic for loop and for range loop to traverse a large array of elements of these types and see what the result is:

$go test -bench . benchmark3_test.go
BenchmarkClassicForLoopLargeStructArrayU5-8 22030 54116 ns/op 0 B/op 0 allocs/op
BenchmarkClassicForLoopLargeStructArrayU4-8 22131 54145 ns/op 0 B/op 0 allocs/op
BenchmarkClassicForLoopLargeStructArrayU3-8 22257 54001 ns/op 0 B/op 0 allocs/op
BenchmarkClassicForLoopLargeStructArrayU2-8 22063 54580 ns/op 0 B/op 0 allocs/op
BenchmarkClassicForLoopLargeStructArrayU1-8 22105 54408 ns/op 0 B/op 0 allocs/op
BenchmarkForRangeLargeStructArrayU5-8 3022 391232 ns/op 0 B/op 0 allocs/op
BenchmarkForRangeLargeStructArrayU4-8 4563 265919 ns/op 0 B/op 0 allocs/op
BenchmarkForRangeLargeStructArrayU3-8 6602 182224 ns/op 0 B/op 0 allocs/op
BenchmarkForRangeLargeStructArrayU2-8 10000 111966 ns/op 0 B/op 0 allocs/op
BenchmarkForRangeLargeStructArrayU1-8 35380 34005 ns/op 0 B/op 0 allocs/op
ok command-line-arguments 15.907s

We see a strange phenomenon: no matter what type of structure, the performance of classic for loop traversal is the same, but the traversal performance of for range will decrease as the number of structure fields increases.

With doubts, I found an issue related to this problem: cmd/compile: optimize large structs, this issue roughly says that for struct types containing a certain number of fields, it is currently unable. If SSA is not possible, then It cannot pass SSA optimization, which is also an important reason for the above benchmark results.

In Go, almost all uses of arrays can be replaced by slices. The author still recommends replacing iteration over arrays with iterative slices as much as possible, so that consistent and stable traversal performance can always be achieved.

Leave a Comment