Appreciating Python’s match-case by parsing Python code — Python Morsels

Trey Hunner

9 minute read


Python 3.10

I stayed up past my bedtime recently and made a script and later a web app to convert a dataclass to a non-dataclass. The web app is powered by a WebAssembly build of Python (which also powers my Python pastebin tool).

While making this script I found excuses to use odd Python features, the most interesting being Python’s matchcase statement.

Python 3.10 added a matchcase block that folks often assume to be equivalent to the switchcase blocks from other programming languages. While you can use matchcase like switchcaseyou usually wouldn’t: matchcase is both more powerful and more complex than switchcase. Python’s matchcase blocks are for structural pattern matching — that phrase sounds complex because it is!

I’ll write a follow-up post soon on how this script works at a high level, but right now my adventures using structural pattern matching to write this code.

Why remove dataclasses?

First let’s briefly talk about why I made this tool.

Why would anyone want to convert a dataclass into “not a dataclass”?

There are trade offs with using dataclasses: performance concerns (which don’t usually matter) and edge cases where things get weird (__slots__ and slots=True are both finicky). But my reason for creating this dataclass to regular class converter was to help me better teach dataclasses. Seeing the equivalent code for a dataclass helps us appreciate what dataclasses do for us.

Okay let’s dive into matchcase.

I knew the adventure I was embarking on involved parsing Python code. I don’t usually parse Python code: I leave that up to tools like Black, flake8, and the Python interpreter itself.

But I did know that Python’s ast module had a parse function which could accept a string representing Python code and return an “abstract syntax tree” (often shortened to AST) that represented that Python code.

Using ast.parse to get a tree of AST nodes was easy. The hard part came in making sense of those deeply-nested AST nodes.

I found myself writing a lot of ifelif blocks with very complex conditions. Take this code for example:

if isinstance(node, ast.Call):
    if (isinstance(node.func, ast.Attribute)
            and node.func.value.id == "dataclasses"
            and node.func.attr == "dataclass"):
        return True
    elif node.func.id == "dataclass":
        return True
elif (isinstance(node, ast.Attribute)
        and node.value.id == "dataclasses"
        and node.value.attr == "dataclass"):
    return True
elif isinstance(node, ast.Name) and node.id == "dataclass":
    return True
else:
    return False

That code checks for 4 different uses of the dataclass decorator:

  1. dataclasses.dataclass(...)
  2. dataclass(...)
  3. dataclasses.dataclass
  4. dataclass

After writing the above code I remembered playing with matchcase shortly after Python 3.10 was released. Seeing those isinstance checks in particular made me think wait a minute, matchcase was made for this!”

After introspecting (via breakpoint and Python’s debugging friends), I found that I could refactor the above ifelif into this equivalent matchcase block:

match node:
    case ast.Call(
        func=ast.Attribute(
            value=ast.Name(id="dataclasses"),
            attr="dataclass",
        ),
    ):
        return True
    case ast.Call(func=ast.Name(id="dataclass")):
        return True
    case ast.Attribute(
        value=ast.Name(id="dataclasses"),
        attr="dataclass"
    ):
        return True
    case ast.Name(id="dataclass"):
        return True
    case _:
        return False

With each of the case statements I wrote above, assertions were made about:

  1. The type of object being matching
  2. The types of specific attribute values
  3. The types and values ​​of subattributes: we matched attributes deeplysuch as node.func.value.id

That first case statement nicely demonstrates the power of matchcase for matching deeply-nested data structures. We’re using a single expression to confirm that node is a Call statement and the expression it’s calling is an attribute lookup of dataclasses.dataclass:

    case ast.Call(
        func=ast.Attribute(
            value=ast.Name(id="dataclasses"),
            attr="dataclass",
        ),
    )

Compare that to these nested if statements, which do the same thing:

if isinstance(node, ast.Call):
    if (isinstance(node.func, ast.Attribute)
            and node.func.value.id == "dataclasses"
            and node.func.attr == "dataclass")

Both of those blocks of code say “I have a Call object which contains an Attribute object which has a specific attr and also contains a Name with a certain idBut the matchcase statement does that so much more succinctly and I found it much more readable than the equivalent ifelif.

Using “or patterns” to match multiple sub-patterns

During this matchcase refactoring I realized I needed an easy way to say “this attribute could be either A or B”.

I dug through the structural pattern matching tutorial PEP and (fortunately) found just what I needed: the | operator. The | operator allows a single case statement to match against multiple patterns at once.

Instead of this giant if statement (note that giant elif clause):

if subnode.value == None:
    field = dataclasses.field()
elif (isinstance(subnode.value, ast.Call) and (
        isinstance(subnode.value.func, ast.Name)
        and subnode.value.func.id == "field"
        or
        isinstance(subnode.value.func, ast.Attribute)
        and isinstance(subnode.value.func.value, ast.Name)
        and subnode.value.func.value.id == "dataclasses"
        and subnode.value.func.value.attr == "field")):
    field = dataclasses.field(**{
        kwarg.arg: parse_field_argument(kwarg.arg, kwarg.value)
        for kwarg in subnode.value.keywords
    })
else:
    field = dataclasses.field(default=ast.unparse(subnode.value))

I wrote this match statement:

match subnode:
    case ast.AnnAssign(value=None):
        field = dataclasses.field()
    case ast.AnnAssign(
        value=ast.Call(
            func=
                ast.Name(id="field")
                |
                ast.Attribute(value=ast.Name(id="dataclasses"), attr="field")
        )
    ):
        field = dataclasses.field(**{
            kwarg.arg: parse_field_argument(kwarg.arg, kwarg.value)
            for kwarg in subnode.value.keywords
        })
    case ast.AnnAssign():
        field = dataclasses.field(default=ast.unparse(subnode.value))

That match statement is very complex, but it’s much less visually dense than that if statement was. That second case statement ensures that the annotated assignment node we’re matching has a value attribute which is either field(...) or dataclasses.field(...).

    case ast.AnnAssign(
        value=ast.Call(
            func=
                ast.Name(id="field")
                |
                ast.Attribute(value=ast.Name(id="dataclasses"), attr="field")
        )
    ):

Writing this 8 line long case statement with that “or pattern” felt very silly. But I found that I prefer it over the alternative elif logic:

elif (isinstance(subnode.value, ast.Call) and (
        isinstance(subnode.value.func, ast.Name)
        and subnode.value.func.id == "field"
        or
        isinstance(subnode.value.func, ast.Attribute)
        and isinstance(subnode.value.func.value, ast.Name)
        and subnode.value.func.value.id == "dataclasses"
        and subnode.value.func.value.attr == "field")):

The Zen of Python says “simple is better than complex” but it also says complex is better than complicated. Both the elif and case statements above are complex because making sense of abstract syntax trees is an inherently complex activity. But that case statement seems a bit less complicated than the elif equivalent.

Conditional patterns with guard clauses

The last matchcase feature I discovered caught me by surprise.

In this if statement, the third condition can’t be boiled down to a simple structural pattern in matchcase land:

if isintance(node, ast.ImportFrom) and node.module == "dataclasses":
    continue  # Don't import dataclasses anymore
elif isinstance(node, ast.Import) and node.names[0].name == "dataclasses":
    continue  # Don't import dataclasses anymore
elif isinstance(node, ast.ClassDef) and any(
    is_dataclass_decorator(n)
    for n in node.decorator_list
):
    need_total_ordering |= update_dataclass_node(node)
    new_nodes.append(node)
else:
    new_nodes.append(node)

At first I thought I needed to give up on using matchcase for that condition and resort to a nested ifelse statement. But then I stumbled upon guard clauses.

Guard clauses are handy when you need a case clause that has some actual boolean logic in it. Using a guard clause, the above ifelif can be rewritten like this (note that third case statement with that if condition on the end):

match node:
    case ast.ImportFrom(module="dataclasses"):
        continue  # Don't import dataclasses anymore
    case ast.Import(names=[ast.alias("dataclasses")]):
        continue  # Don't import dataclasses anymore
    case ast.ClassDef() if any(
        is_dataclass_decorator(n)
        for n in node.decorator_list
    ):
        need_total_ordering |= update_dataclass_node(node)
        new_nodes.append(node)
    case _:
        new_nodes.append(node)

In that third case statement we’re checking the type of the node (something matchcase statements are great at) and we’re also asking a complex question about the decorator_list attribute of that node (thanks to that if guard clause with that any(...) logic).

While I did find a guard clause helpful here, this feature does feel like an escape hatch that should only be used when there’s not a more readable alternative.

Structural pattern matching visually describes the structure of objects

This undataclassing adventure was not my first time using matchcase. But before I wrote undataclass.py most of my matchcase statements involved matching iterables.

For example while prepping a talk on match-case for my local meetup, I noticed that this Django template tag parsing function:

def do_get_available_languages(parser, token):
    args = token.contents.split()
    if len(args) != 3 or args[1] != "as":
        raise TemplateSyntaxError(
            "'get_available_languages' requires 'as variable' (got %r)" % args
        )
    return GetAvailableLanguagesNode(args[2])

Could be rewritten like this:

def do_get_available_languages(parser, token):
    match token.split_contents():
        case [name, "for", code "as" info]:
            return GetLanguageInfoNode(parser.compile_filter(code), info)
        case [name, *rest]:
            raise TemplateSyntaxError(
                f"'{name}' requires 'for string as variable' (got {rest!r})"
            )

Even if you don’t understand how structural pattern matching works, that second block of code is likely Easier to make guesses about at a glance. Just like with tuple unpacking, that matchcase statement Visually demonstrates the shape of our code.

Python’s matchcase statement can even be used to match nested dictionary items. For example this nested dictionary-processing code:

if webhook_data["event_type"] == "order_created":
    customer_id = webhook_data["content"]["customer"]["id"]
    order = webhook_data["content"]["order"]
    process_order(customer_id, order)
elif webhook_data["event_type"] == "payment":
    customer_id = webhook_data["content"]["customer"]["id"]
    order = webhook_data["content"]["payment"]
    process_payment(customer_id, payment)
else:
    process_other(webhook_data)

Could be refactored to use structural pattern matching like this:

match webhook_data:
    case {
        "event_type": "order_created",
        "content": {
            "order": order,
            "customer": {"id": customer_id},
        },
    }:
        process_order(customer_id, order)
    case {
        "event_type": "payment",
        "content": {
            "payment": payment,
            "customer": {"id": customer_id},
        },
    }:
        process_payment(customer_id, payment)
    case _:
        process_other(webhook_data)

Is that clearer? I’m not sure. But it’s definitely much more visually-oriented: those case kind statements of like the webhook_data object that we’re trying to describe.

Along with tuple unpacking and list comprehensions, matchcase results in code that looks like the objects we’re describing. Matching a nested dictionary results in code that looks like a nested dictionary. Matching a list or tuple of length N involves writing a list of length N. And in my case, matching an abstract syntax tree involves writing code that looks like an abstract syntax tree.

When writing parsers and matchers, consider matchcase

Python’s matchcase statement is both complex and amazing.
I do not recommend using matchcase in cases where an ifelif block is simpler (which is most of the time). But, like many complex abstractions, matchcase does have its uses.

In particular, using structural pattern matching can Make the intent of AST-matching code easier to understand at a glance.

You should consider matchcase statements when:

  • You end up in a scary land full of isinstance checking
  • You’re matching lists/tuples by their size and contents
  • You’re pattern matching against dictionary keys and values

Though in all likelihood, you don’t need matchcase and Your code would likely be simpler without it.

Python’s structural pattern matching definitely makes parsing Python code much easier and I’m grateful I thought to use it when creating my undataclass tool.

PS If you’re wondering how that undataclass.py script works, stay tuned (hop on my Python tips newsletter) as I’ll write-up a further explanation soon.

Write more Pythonic code

Need to fill-in gaps in your Python skills? I send regular emails designed to do just that.

Leave a Comment