How to Shuffle a Generator in Python…Without Converting to a List

Can you shuffle a generator in Python without going through a list or other data type first? A quick Google search seems to say no, sort of. I suspect a more in-depth search might yield more results. Excuse the yield pun. But life’s too short for Google searches. So I thought I’d give it a try.

What’s the context? I often like to dive randomly into documentation pages, find a function that I’ve not seen or used before, and explore it. Some hobby I have, you’re thinking. I came across itertools.tee(), which had a curious name. Is this the golfing term tee or the letter ‘T’, or perhaps the tea you drink spelt incorrectly? Who knows? (It’s actually the second option, I later found out, but by now, I was sufficiently intrigued to explore further.)

Exploring itertools.tee()

So I read the docs for itertools.tee(). The function “returns n independent iterators from a single iterable”. OK, seems simple enough, no? The documentation goes on to show code that’s equivalent to what tee() does. This is one of those instances where the Python docs weren’t enough for me to say “Ah, great, it’s all very clear now.”

So I googled a bit more and found lots of dry examples showing how tee() works in a four-line-code-snippet-type example. They show what itertools.tee() does. But they don’t shed any light on why you’d want to use it and when.

Luckily, it didn’t take long to find David Amos’s RealPython article. Finally, some sense. Do read this overview of itertools through lots of great examples. But first, finish reading this article, of course!

This gave me inspiration for exploring itertools.tee() and one thought led to another, which led to the question:

“Can you shuffle a generator in Python without going through a list or some other data type?”

This takes us back to the first line of this article, but this article is not on loops, so I won’t go there!

The boring bit

So, I’m now contractually obliged to give you one of those dry examples that shows you what itertools.tee() does, but nothing else. Don’t worry. Better examples are coming later on!

>>> import itertools
>>> original_generator = (number for number in range(10))
>>> original_generator
<generator object <genexpr> at 0x7fec3027a4a0>

>>> first, second, third = itertools.tee(original_generator, 3)
>>> first
<itertools._tee object at 0x7fec3028a500>
>>> second
<itertools._tee object at 0x7fec3028a140>
>>> third
<itertools._tee object at 0x7fec3028acc0>

As the documentation said, tee() returns independent iterators from the original iterable. All three will iterate through all the items in the original iterable. The writers returned are _tee objects. In this case, the original iterable is a generator.

The second argument in tee() determines how many independent iterators the function returns. Let’s check they’re independent:

# Get the first two values from `first`
>>> next(first)
0
>>> next(first)
1

# Now exhaust `second` fully
>>> for item in second:
...     print(item)

0
1
2
3
4
5
6
7
8
9

# And get a value from `third`
>>> next(third)
0

Each of the three iterators first, secondand third go through values ​​independently from each other. When you looped through secondthe code printed all numbers from 0 to 9 even though you had already used up 0 and 1 in first. And third was still untouched!

Note that the three iterators are independent of each other, but They’re not independent of the original generator:

# Recreate the original generator and the three independent iterators
>>> original_generator = (number for number in range(10))
>>> first, second, third = itertools.tee(original_generator, 3)

# Use up the first two values from the original generator
>>> next(original_generator)
0
>>> next(original_generator)
1

# The iterators from tee() start from where you've just left off!
>>> next(first)
2
>>> next(second)
2

You’ll return to tee() later to see how and when it can be useful. You’ll also revisit the issue of when generators are and aren’t independent of each other.

Exploring itertools.islice()

I’ve promised I’ll dive into how to shuffle a generator in Python without storing all the items in memory. You’ll get there soon. But first, a quick dive into another function in itertools.

You can create a slice in an iterable by using itertools.islice(). This returns an iterator. The concept is similar to slicing through sequences in the normal way with the difference that the result is an iterator:

>>> import itertools
>>> original_generator = (number for number in range(10))
>>> a_slice = itertools.islice(original_generator, 4, 8)
>>> a_slice
<itertools.islice object at 0x7fec3026d720>

>>> next(a_slice)
4
>>> next(a_slice)
5
>>> next(a_slice)
6
>>> next(a_slice)
7
>>> next(a_slice)
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module>
StopIteration

# But, be careful!
>>> next(original_generator)
8

The iterator slice you created starts from the value at index 4 and goes up to, but excluding, the value at index 8. You’ve set these values ​​using the second and third arguments in islice().

You can see how you call next(a_slice) four times successfully. These calls return 4, 5, 6and 7. However, when you call next(a_slice) again, you get a StopIteration error as the islice iterator is exhausted.

What about original_generator? So far, you haven’t explicitly used original_generator except for creating the islice. However, the result of next(original_generator) is 8. This means that original_generator and a_slice are not independent. When you advanced through a_sliceyou also advanced through original_generator.

How to Shuffle a Generator in Python

I will stick with the simple generator with numbers from 0 to 9 in this example. Of course, if you wanted a generator with random numbers from 0 to 9, you could create one directly. However, this is not the case for other generators you may have in your code. I’ll keep using this example as it’s easy to demonstrate what’s going on.

You can’t use functions such as random.shuffle() or numpy.random.shuffle() on a generator:

>>> import random
>>> original_generator = (number for number in range(10))

>>> random.shuffle(original_generator)
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/random.py", line 359, in shuffle
    for i in reversed(range(1, len(x))):
TypeError: object of type 'generator' has no len()

A generator doesn’t have a __len__ attribute. Therefore, these functions can’t work.

The solution in many cases is to convert the generator into a list, shuffle the list, and then convert it back into a generator if that’s what you’d like to have:

>>> import random
>>> original_generator = (number for number in range(10))

>>> numbers = list(original_generator)
>>> random.shuffle(numbers)
>>> numbers
[3, 7, 6, 5, 2, 0, 8, 9, 1, 4]

>>> new_generator = (number for number in numbers)

Often, this is a perfectly good solution. But if you have a large generator and don’t want to commit the resources in time and memory to convert into a list, you’re a bit stuck.

lucky, itertools and the two functions you’ve explored earlier in this article can come to the rescue.

How to Shuffle a Generator in Python Without Converting into A List

I’ll consider the case where you know how many values ​​there are in your generator. I’ll say a few words at the end about generalising this further.

The technique you’ll use here is the following:

  • Create two independent editors from the original generator
  • Choose a random index and slice the two iterators using this index so that one has the first part of the original and the other has the second part
  • Yield the value at the location of the split
  • Merge the remaining parts back into a single iterator and repeat the process until you’ve used up all the values ​​in the original generator

This method is inspired by David Amos’s example in the article I mentioned in the introduction.

You can start by creating the generator you’ve already used several times in this article and define a generator function using the yield keyword. I’ll use a script for this example rather than the console sessions I used earlier.

# shuffle_generators.py

n = 10
original_generator = (number for number in range(n))

def randomise_generator(original, length):
    while True:
        yield

new_generator = randomise_generator(original_generator, n)

for number in new_generator:
    print(number)

The generator function randomise_generator() yields None forever for the time being. You’ll fix this soon.

You’ve also written code to create a new generator from the generator function randomise_generator() and test it by going through the new generator using a for loop.

If you run this code now, it will print None forever!

First attempt: Just using islice()

Let’s try to use itertools.islice() directly on the original generator first. Spoiler alert: this won’t work. But let’s see why:

# shuffle_generators.py

import itertools
import random

n = 10
original_generator = (number for number in range(n))

def randomise_generator(original, length):
    while True:
        idx = random.randint(0, length - 1)
        first_part = itertools.islice(original, idx)
        second_part = itertools.islice(original, idx, None)
        yield next(second_part)
        original = itertools.chain(first_part, second_part)
        length -= 1
        if length == 0:
            return

new_generator = randomise_generator(original_generator, n)

for number in new_generator:
    print(number)

You’re first choosing a random index where you’ll split your generator. Next, you use this index to create two iterator slices from the original generator. Note that when you use islice() with two arguments, the second argument is the stop parameter and the start defaults to index 0. Therefore, first_part is a slice from the beginning of the original generator up to, but excluding, the value with index idx.

When you call islice() with three arguments, the second and third are the start and stop parameters. If the third is Nonethe slice goes to the end.

Next, you yield the first value of second_part. This is the value just after the point where you split the generator into two.

Following the yield statement, you put the two remaining parts together again using itertools.chain(). The plan is to merge the remaining parts of the original iterator minus the one value you’ve already removed.

You decrease the value of length by 1 to account for the element you’ve removed and yielded already and put in a condition to end the generator function when there are no more elements left.

You run this code, and you get this:

0
4
9
Traceback (most recent call last):
  File "<file_path>", line 15, in randomise_generator
    yield next(second_part)
StopIteration

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<file_path>", line 23, in <module>
    for number in new_generator:
RuntimeError: generator raised StopIteration

Both the values ​​and the number of outputs you’ll get before the error will be different each time you run this code. But you’ll always end up with the StopIteration error.

Let’s investigate this problem by going back into the console. In this example, you’re splitting the generator at index 6:

>>> import itertools
>>> original_generator = (number for number in range(10))
>>> first_part = itertools.islice(original_generator, 6)
>>> second_part = itertools.islice(original_generator, 6, None)

>>> for number in first_part:
...     print(number)

0
1
2
3
4
5

>>> for number in second_part:
...     print(number)


>>> 
# There's no output from the second `for` loop

You intend to create two iterator slices. One from 0 to 5 and the other from 6 to 9. The first islice seems to be correct. When you iterate through first_partyou get the expected numbers.

However, when you iterate through second_part you get no output. The iterator second_part is empty.

You can check if the same thing happens if you use second_part before first_part. Remember you’ll need to recreate the original generator and the slices each time:

>>> original_generator = (number for number in range(10))
>>> first_part = itertools.islice(original_generator, 6)
>>> second_part = itertools.islice(original_generator, 6, None)

>>> for number in second_part:
...     print(number)

6
7
8
9

>>> for number in first_part:
...     print(number)

>>>
# Again, no output from the second loop

This time, it’s first_part that’s empty. This is because the iterator slices are not independent of the original generator. When you exhaust an iterator slice, you’re also using up the original generator. You’ve seen this issue earlier in this article when you first read about itertools.islice()

Second attempt: tee() to the rescue

This is where itertools.tee() comes in useful. This function creates two independent iterators from an iterable. The independence is the important part here!

To be able to shuffle a generator in Python, you can update the code to include itertools.tee():

# shuffle_generators.py

import itertools
import random

n = 10
original_generator = (number for number in range(n))

def randomise_generator(original, length):
    while True:
        idx = random.randint(0, length - 1)
        first_iter, second_iter = itertools.tee(original, 2)
        first_part = itertools.islice(first_iter, idx)
        second_part = itertools.islice(second_iter, idx, None)
        yield next(second_part)
        original = itertools.chain(first_part, second_part)
        length -= 1
        if length == 0:
            return

new_generator = randomise_generator(original_generator, n)

for number in new_generator:
    print(number)

First, you create first_iter and second_iter using itertools.tee(). Both iterators go through all the elements of the original generator, but they’re independent of each other.

Next, you create iterator slices from first_iter and second_iter. You no longer have the problem you encountered in the previous section as these are independent iterators now.

You can verify this in the console:

>>> import itertools
>>> original_generator = (number for number in range(10))
>>> first_iter, second_iter = itertools.tee(original_generator, 2)
>>> first_part = itertools.islice(first_iter, 6)
>>> second_part = itertools.islice(second_iter, 6, None)

>>> for number in first_part:
...     print(number)

0
1
2
3
4
5

>>> for number in second_part:
...     print(number)

6
7
8
9

In this example, first_part goes from 0 to 5 and second_part goes from 6 to 9. Independence problem solved!

You can run the shuffle_generators.py script now. You’ll verify that new_generator is a generator which has all the values ​​in original_generatorbut they’ve been shuffled:

5
8
6
7
1
0
2
3
9
4

You Can Now Shuffle a Generator in Python Without Converting to a List

Note that you’re never converting the generator into a list or retrieving items from the generator except at the very end when you’re testing that your code works. If you’re working with a large generator, this will save memory and possibly some time, too.

The code shown in this article works if you know how many values ​​there are in the generator. I’m not going to extend this further in this article.

However, you should be able to deal with the case where length is not known by starting with a random value for length in a try...except block and look for valid values ​​of length. If you find an invalid value, you can go lower, and if you find a valid value, you can go higher. Full disclosure: I’ve not tried this version. I may at some point in the future. Or I may not!

Final Words

You’ve used itertools.tee() and itertools.islice() to shuffle a generator in Python without creating a list along the way.

I’m sure there are other ways of solving this problem—possibly others that are neater than this solution. If you have another way of shuffling a generator in Python without converting it into a list, please do share.


You may also like the article on stacks, queues, and deques


Get the latest blog updates

No spam promise. You’ll get an email when a new blog post is published


Leave a Comment