1. Recently, as an aside to the proposal for static type annotations in Python, Guido offhandedly noted another suggested improvement to Python, adding ADTs (Algebraic Data Types). This has also come up in discussions about pattern matching, and about existing run-time and compile-time static-typing libraries.

    What is an ADT?

    Instead of trying to fully define an ADT in formal terms (see Wikipedia, or, better, work through a Haskell tutorial, for that), I'll try to define it in terms of values as you might use them in Python.

    Basically, an ADT is either a simple type, a product of ADTs, or a sum of ADTs. That's what makes it "algebraic".

    Product types

    What is a product of types? It's a tuple: (42, "spam") is of type int * str, or, written more simply, (int, str).*

    Often you want to add names to the values, like (answer: 42, manna: "spam"), which is of type (answer: int) * (manna: str), or (answer: int, manna: str). Note the equivalency to the "implicit namedtuple literals" that people suggest for Python every few months.

    And often you want to add a name to the tuple itself, making this a full "record" type, like ImportantStuff(answer: 42, manna: "spam"), which is of type ImportantStuff:((answer: int) * (manna: str)) or ImportantStuff(answer: int, manna: str). This is exactly like the namedtuple that Python has today. It's also like a C struct (or Python ctypes.Structure).

    Note that none of these is like an OO class, especially Python's rendition of that concept. Besides the fact that classes have methods, they have an arbitrary bag of attributes, not an ordered sequence of them. They may have private attributes that are part of the structure but not part of the interface, and computed attributes that are part of the interface but not part of the structure. And so on. (The classes in weaker OO languages like C++ and Java are closer to record types—which is why C++ just defines "class" and "struct" to mean the same thing, except for the default visibility of their members.)

    * Notice that, since a tuple type is just a product of types, a single-element tuple is the exact same thing as its one value. And an empty tuple is a singleton value of some unit type that's the result of not multiplying anything (equivalent to 0 when multiplying integers). Python of course distinguishes single-element tuples from their values: (spam,) != (spam). And Python's closest thing to a unit type is NoneType, whose value, None, is often used inside tuples. So, Python's tuple isn't quite a purist version of an untagged product type. But I think we can ignore that difference here.

    Sum types

    A sum type is a union of two types. This is the part Python doesn't really have today.

    Consider a URL-fetching function, which can return either a successful download, an error download, or a failure to even get an error. (Ignore for now the fact that Python would use an exception for the third case.) So, it might return something like:

        ((headers: Sequence[str], body: Iterable[str])
         | (code: int, description: str, headers: Sequence[str], body: Iterable[str])
         | (errno: int,))

    On its own, this isn't very usable; to know which of the three you've gotten, you need to either LBYL the members of the returned record with getattr, or EAFP it by using a try:/except AttributeError:. And consider that, at least with HTTP, successful downloads actually have a code and description, just like failed downloads, meaning the two record types are identical, giving you no way to distinguish them! So often, you will want to tag the values:

        (success: (headers: Sequence[str], body: Iterable[str])
         | failure: (code: int, description: str, headers: Sequence[str], body: Iterable[str])
         | error: (errno: int,))

    But notice that a tagged union of plain (named) tuples is exactly equivalent to an untagged union of records. You could just as well write that type as:

        (Success(headers: Sequence[str], body: Iterable[str])
         | Failure(code: int, description: str, headers: Sequence[str], body: Iterable[str])
         | Error(errno: int,))

    Either way, a language could allow you to dissect unions with the same member syntax used by records. But the latter allows you to dissect them by pattern matching, or sometimes just by comparison, which is often much simpler.

    In practice

    Most languages that make heavy use of ADTs go with a sort of hybrid: the tags are part of the named record type, but named record types can only be defined inside sum types, and therefore it doesn't really matter.

    For example, here's how cons lists are generally defined:

        List(Cons(head: T, tail: List[T]) | Nil())

    Here Cons and Nil are called constructors for List. Whichever way you interpret this, Nil() is an instance of type Nil (or of the Nil kind of type List), and also an instance of type List.* Cons(1, Nil()) is an instance of type Cons[int] (or the Cons kind of type List[int]), and also an instance of type List[int].

    However, this does raise a question of namespacing: is the name Cons available in the same scope as List, or do you have to write Cons.List? (It's not a coincidence that this same question that comes up with enum in many languages, which I'll get to later.)

    * It's worth noting that in most ADT-based languages, this Nil() can have a type, like List[int]; the fact that it isn't always inferable just means that sometimes you have to specify it explicitly.

    Optional/Maybe

    One special case that people often propose for Python isn't really special at all.

    A value that can be either an int or None is just int | NoneType.

    A value that can be either an int or None, with the type tagged by a name, is (Just(int) | Nothing(NoneType)).*

    In languages that make sum types very concise, like Haskell,** it's perfectly reasonable to write Maybe int as a type, and Just 42 or Nothing as values of that type, and to pattern match the value as Just n | Nothing.

    Many other languages add syntactic sugar: int? means Maybe[int], n! means the value of n or raise an exception if it's Nothing, n? means nil-chaining (an expression with n? has the value of the expression with the Just value if n is not Nothing, or it has the value Nothing if n is Nothing), there may be shortcut values to extract a value (e.g., Swift's "if let n: int = m {} else {}", and so on. But under the covers, these are all doing things that can be expressed directly in terms of the sum type.

    * Note that the direct mapping of Haskell's Maybe to Python would be (Just(int,) | Nothing()). But because Python's unions aren't mathematical product types, and NoneType isn't a real unit type, as described earlier, it's more convenient to tag non-record-typed values in the Maybe, and there's really no theoretical reason you can't do that, it just means either treating the tags are part of the union rather than part of a record constructor, or a bit of extra syntactic sugar.

    ** In Haskell, tags are available in the outer namespace, and type parameters and constructors, like functions, don't need parentheses around their arguments.

    Fitting ADTs into Python

    Product types

    As noted above, Python already has tuples and named record types, but not anonymous record types. How do we add those?

    namedtuple

    The existing namedtuple works fine for named record types.

    The proposals for implicit namedtuple literals have, as far as I know, been coherent (at least Christian Tismer's worked out all of the tricky details and showed that they weren't so tricky after all); the only reason they haven't gone anywhere is that no one could explain a real need for them. So let's just add that, and use it for our anonymous tagged records:

        (answer: 42, manna: "spam")

    The type of that is:

        (answer: int, manna: str)

    Alternative: typed __slots__

    A class with __slots__ is also pretty close to a named record type, and in some ways seems like a better fit, assuming we had a syntax to type the slots, which seems pretty easy.

    However, the fact that, unlike a namedtuple, it's not a tuple is at least a theoretical problem, if not a practical one. The fact that there is nothing requiring the constructor arguments to map to the slots is a bigger problem. And it's hard to imagine a way to write these anonymously and inline (short of some kind of horrible expression using the type class's constructor).

    Sum types

    Python doesn't have sum types at all, except implicitly. And explicit sum types, especially either tagged sums of anonymous products or sums of named products, are exactly what people are missing when they ask for ADTs in Python.

    So, how could we add them?

    MyPy Union

    MyPy adds a Union type: Union[int, NoneType] (or, more compactly, int | str) is a value that can be either an int or a str.

    While this only gives us untagged unions, as noted above, tagged unions are equivalent to untagged unions of record types. So, if we wanted to define a Haskell-like Maybe, we could do this:

        Just = namedtuple('Just', (('value', T,)))
        Nothing = namedtuple('Nothing', ())
        Maybe = Just | Nothing

    Also, without pattern matching, the only way to actually use this is with isinstance checks (or EAFP try/except):

        def spam(eggs: Maybe[int]):
            if isinstance(n, Just[int]):
                print('Spam with {} eggs'.format(eggs.value))
            else:
                print('The spam stands alone')

    There's nothing theoretically wrong here, but it certainly is ugly.

    Swift-style Enum

    Consider that a sum type of tagged parameterless type constructors is equivalent to a Python-style Enum, with the constructors mapping to enumeration members:

        Color = (Red | Blue | Green)

        Color = Enum('Color', 'Red Blue Green')

    You can do the same things with each (although with slightly different syntax). The question of whether Red, Blue, and Green leak into the parent scope or are attributes of Color arises in exactly the same way for the two cases, and has the same pro and con arguments.

    So, what if Enum were extended to allow values to be attached to each member? This is exactly what Swift enumerations are, and this is how Swift handles sum types.

    I'm going to assume some extension to the language that allows statically typing variables and attributes, rather than doing it in comments a la MyPy, and I'm going to assume that tuples of types are specified as tuples of types rather than some other syntax, but feel free to translate.

        class Barcode(Enum):
            UPCA: (int, int, int, int) = 1
            QRCode: str = 2

    So now, you can do this:

        >>> barcode = Barcode.UPCA(8, 85909, 51226, 3)
        >>> barcode
        <Barcode.UPCA(8, 85909, 51226, 3): 1>
        >>> barcode.value
        (8, 85909, 51226, 3)

    Or, if you wanted to name the parts of the UPC-A:

        class Barcode(Enum):
            UPCA: (LL: int, L: int, R: int, RR: int) = 1
            QRCode: str = 2

        >>> barcode = Barcode.UPCA(8, 85909, 51226, 3)
        >>> barcode
        <Barcode.UPCA(LL: 8, L: 85909, R: 51226, RR: 3): 1>
        >>> barcode.LL
        8

    Of course you still need isinstance, pattern matching, or EAFP code to distinguish whether you've got a UPCA or a QRCode:

        >>> barcode = Barcode.QRCode('ABCDEFGHIJKLMNOP')
        >>> barcode.LL
        AttributeError: 'Barcode.QRCode' object has no attribute 'LL'
        >>> barcode

    But still, it looks a lot prettier than using Union.

    What about recursion?

    One of the important features of ADTs is that they can be recursive. See the List example above: the second element of a Cons is a List.

    In Python, this means we either need some kind of explicit forward reference (as used by MyPy), or a way to define sum types and record types in such a way that they can automatically use the type being defined.

    Notice that it's relatively easy to write an "auto-numbered" Enum class that doesn't require you to specify a value; referencing an unknown name just gives it the next number in sequence and binds it. With a slightly smarter/hackier metaclass, I believe you could also write an "auto-recursive" Enum that assumes any unknown name is the class itself, and then checks at construction time that the only unknown name seen was the class name. I haven't actually tried to do this, but I'm going to just naively assume it's possible; if not, something like MyPy's forward references may be necessary.

    So ADTs are just a subset of classes?

    Each namedtuple type is a class, and each Enum type is a class, and I've defined ADTs in terms of (implicit) namedtuples and Enums. This seems at first to go against the spirit of ADTs, which are a completely alternative form of typing to OO classes.

    But note that in Python, tuple is a class, int is a class, NonteType is a class, every type is a class.

    More importantly, ADTs are a strict subset of classes. The "strict subset" part is what makes them useful: they have a static structure, they have 

    Does Python really need this?

    The big question is: under what circumstances would you want to use an ADT (whether implemented as an Enum with anonymous nanedtuple values, or otherwise) instead of a class?

    If the language had pattern matching that only worked on ADTs, the answer might well be "quite often".

    But without pattern matching, or with pattern matching that worked on classes that chose to opt in (as described in my previous blog post), I think it wouldn't be that often.

    A paradigm case: JSON

    JSON is (quite accidentally) an ADT. You can define it as:

        class JSON(Enum):
            null: NoneType = 0
            boolean: bool = 1
            number: numbers.Number = 2
            string: str = 3
            array: Sequence[JSON] = 4
            obj: Mapping[str, JSON] = 5

    So, what would your code look like if json.loads returned one of these, instead of just returning a dict?

    For the quick&dirty case, it would obviously be a whole lot more inconvenient to parse out a JSON than a plain dict. Today, you can already write:

        j = json.loads(s)
        j['aliases'][0]['first']

    But if you want to specify a schema, map it to objects in your hierarchy, and validate it, that would be a huge pain with dynamically walking the structure, but comparatively easy treating JSON as an ADT:

        class Name((first: str, last: str)):
            def tojson(self):
                return JSON.object({'first': self.first, 'last': self.last})
            @classmethod
            def fromjson(cls, j):
                case j:
                    of JSON.object as o:
                        return cls(o['first'], o['last'])
                raise ValueError('{} is not a name'.format(j))

        class Person((aliases: Sequence[Name], age: float)):
            def tojson(self):
                return JSON.object({'aliases': JSON.array(map(Name.tojson, self.aliases)), 'age': age})
            @classmethod
            def fromjson(cls, j):
                case j:
                    of JSON.object as o:
                        case o['aliases']:
                            of JSON.array as a:
                                return cls([Name.fromjson(name) for name in a], o['age'])
                raise ValueError('{} is not a person'.format(j))

    (Of course there's no reason Name and Person couldn't be "pure" ADTs, with free functions instead of methods, but it seems more Pythonic this way.)

    This pattern matching syntax is nowhere near as simple as the ML or Haskell equivalent, which implies that maybe I didn't do as good a job with my straw-man syntax as I'd hoped. But note that a JSON parsing framework could automate all of that pattern matching based on the attributes and types of the class. Assuming I'd written such a thing, or found it on PyPI, or inherited it from the web framework I'm using, I could write the above as:

        class Name(JSONable[(first: str, last: str)]):
            pass

        class Person(JSONable[(aliases: Sequence[Name], age: float]):
            pass

    So:

        person = Person.fromjson(json.loads(s))
        person['aliases'][0]['first']

    This looks almost as simple as the non-validating case, but it will raise an exception if s doesn't actually specify a Person.

    So, is that a good enough use case to include ADTs in Python?

    5

    View comments

  2. The idea of a pattern-matching case statement has come up a few times recently (first in a proposal to add a C-style case statement, more recently as part of a the proposal to add static type annotations), but the discussion ends as soon as people realize that it's not quite as easy as it appears.

    The core problem is that the key to pattern matching is unpacking constructors of algebraic data types, and Python has arbitrary OO-style classes, not ADTs. Everything else is bikesheddable, but ultimately obvious. And there are two potential solutions to that core problem.

    I'll start by laying out a straw-man proposal.

    (This was originally written 17 August 2014, and then updated 7 April 2015. The major change is the new section at the end, although I also fixed a couple typos and restored some formatting that Blogspot broke because it sucks.)

    (Also see Pattern matching again, written a year and a half later, after I became less ignorant about a few more languages, most notably Scala.)

    Semantics

    You can match any of the following:
    • Any assignment target: Always matches, and the target is bound to the match value.*
    • A constructor call: This is the tricky bit that most of this post is about.
    • Any expression that isn't ambiguously an assignment target or constructor call: Matches iff the match value == the expression value. (Generally this means literal constants.)
    • Any of the above followed by an "as" clause: Matches iff the pattern matches, and if so, binds the as target to the match value. (That is, "(0, spam) as eggs" matches iff (0, spam) matches, and if so binds eggs. This is mainly useful if you need to further decompose a subpattern, but also get a name for the whole thing.)
    • Any comma-separated list of the above: Unpacks the match value just like in an assignment target list, then matches iff each pattern matches the corresponding value.
    • Any of the above followed by a guard clause: Matches iff the pattern list matches and the if expression (identical to the conditional clause in a comprehension) is truthy. The if test can, and usually will, use values bound by the pattern list match. The "and" mentioned is of course short-circuiting.
    * Note that this counts as an assignment for function-locals rules.

    If you've seen a case expression in Haskell or ML, all of this should be familiar. Except for one conspicuous absence:

    Lists

    The very first use case anyone learns for pattern matching in Haskell or ML is matching the head and tail of a list.

    Of course in Haskell and ML, a list is, under the covers, just semantic sugar for the Cons record type. When you match (x:xs), those x and xs are really the two members of the list object, and (x:xs) is just syntactic sugar for Cons(x, xs). While we could come up with a syntax to match lists similarly in Python, it would be misleading, because a list is not made up of a head and a tail, it's an array made up of N elements.

    On top of that, the main use for pattern-matching lists is to write recursive functions over lists, which is a decidedly non-Pythonic thing to do in the first place.

    So, I've left out list decomposition intentionally.*

    * Note that you can actually match the empty list or tuple, because those are both unambiguously not targets—or, of course, you could match the constructor call list(). And you can match x, *xs against any iterable of 1 or more elements. So, if you really want to recursively decompose a list, you can. But it's a bad idea, so there's no reason to add syntactic sugar to make it easier.

    Non-literal values

    (It's a bit of an oversimplification to say that only literals can be used as values to match, but it's a useful oversimplification for this part of the discussion.)

    In many cases, you're going to want to match the value in, say, spam[3].eggs, but because that's a valid target, you can't. Instead, you have to do it with a guard clause.

    In ML or Haskell, that isn't as much of a problem. For one thing, only identifiers are actually binding targets. For another, each case (or let) is a new scope, and you can't actually rebind variables (although you can shadow them with inner-scope variables of the same name).

    One possibility is to add a special notation that forces something to be a value rather than a target. In the comments, I suggested a hideously ugly straw-man, @target, which draws misleading parallels to not only Haskell pattern matching syntax, but also languages like Perl and its cousins. But I'm sure someone could come up with some better syntax. And that might be worth doing.

    An alternate possibility is to always treat any valid expression as a value, and only treat something as a target if the expression would raise a NameError or LookupError. If you want to rebind an existing variable (or a list member), that would be the special case that requires some special notation. However, I think that would very quickly lead to confusion. Anything that allows "spam" to sometimes mean "bind spam to the value" and sometimes "match the value against the current value of spam" would be hard to read without absorbing a lot more context. Worse, a lot of newcomers to Python would see identifiers being used in case statements for their values and assume it's like a C or Java switch statement, which would be very misleading. And then there's the question of what happens if the expression has a global binding.

    A more extreme, but much simpler, version would be to have no rule at all: there's always special syntax for a value, even if it's a literal--or, alternatively, there's always special syntax for a binding, even if it's a currently-unbound identifier. (Note that libraries that add pattern matching into Python without new syntax seem to go that way—e.g., pypatt uses "quote(x)", rather than just "x", to mean "match anything and bind to x", but they may not have much choice.) But I don't think this is necessary. I think the "anything that looks like a target is a target" rule is easy to understand, remember, teach, and even discover through experimentation.

    Syntax

    Example

    class Breakfast:
            def __init__(self, spam, eggs, beans):
                self.spam, self.eggs, self.beans = spam, eggs, beans
    
        # ...
    
        case order:
            of _, Breakfast(0, _, _):
                raise WaitressError("You can't have it without spam")
            of _, Breakfast(_, _, beans) if beans:
                raise WaitressError("Beans are off")
            of _, Breakfast(spam, eggs, _) if eggs > spam:
                raise WaitressError("That'd be more eggs than spam, can't have that")
            of customer, Breakfast(spam, eggs, beans) as breakfast:
                kitchen.fry_spam(spam)
                kitchen.fry_eggs(eggs)
                kitchen.fry_beans(beans)
                waitress.deliver(customer, breakfast)
            else:
                raise WaitressError("I'm confused by that order")
    
    This syntax obviously isn't ideal, what with the double indent and the of and else statements being statements rather than clauses, but this should be enough to discuss the real issue.

    Note that, unlike Haskell or ML, this is a statement, not an expression.*

    Like an if/elif/else chain, this just tries each case until it finds one that matches, then executes its suite. If none of them match and there's no else, you just fall off the statement, having done nothing.**

    All of these cases are 2-element pattern lists (or 2-element pattern lists with a guard clause), so order[0] gets matched against the first pattern, and order[1] against the second.***

    The first thing is always a target (_ or customer)****, so it always succeeds, and binds the target to order[0].

    The second thing in every case is a Breakfast constructor (or a Breakfast constructor with an as clause). That gets (recursively) matched against order[1], and it's the tricky bit that will be examined in detail.

    The as clause just means that, if Breakfast(spam, eggs, beans) matches, breakfast gets bound to the whole Breakfast object it matched.

    The if clauses are checked only after the pattern list is matched, so eggs and spam are the matched values. If the if condition is false, the whole pattern clause fails.

    Note that a failed clause may still end up binding some names. And the case statement doesn't happen in its own scope. If you wanted to do something confusing, like writing a pattern that always fails and then using the names it bound in the suite of the next pattern, well, consenting adults and all that.

    When the Breakfast constructor gets matched, it will in some way (described later) try to match its arguments. Some of these are targets, but not all of them—i.e., the literal constant 0. Note that matching Breakfast(0, _, _) is equivalent to matching Breakfast(spam, _, _) if spam == 0 (except that the latter will bind spam to the value).

    * Of course in those languages, almost everything is an expression. Just as Python has if statements, often used for side effects, instead of if expressions that have a value, Python would presumably have case statements often used for side effects. Note that Python did eventually add a ternary conditional expression. I suppose it might be possible to add a case expression later, but I don't think it would be very useful; the whole point of the conditional expression is to use it in very short statements that would be overly verbose if split into a 4-line statement.

    ** In most languages with pattern matching, either else is required, or falling off a case without matching anything is an error. But again, that's because case is an expression, and must have a value, and the same thing is true for if expressions.

    *** As with target unpacking, order can be any iterable, not just a sequence. I'd have to look at how assignment unpacking works under the covers, but if it's not building a list but just calling next once for each argument, we could still do the same here. Basically it would just need to try to unpack to up to the Nth element each time it first sees a pattern list of length N, and remember all previously unpacked values and whether it's received StopIteration.

    **** Note that (unlike Haskell) _ doesn't have any special meaning here; we're just binding some of the match values to the name "_". If you bind multiple values to the same name, it gets one of them arbitrarily.

    Grammar

    I've actually built a GRAMMAR that adds this syntax, and from some simple tests there don't seem to be any problems parsing any of it. But of course it would need new AST nodes, at least dummy code generation, and a real test suite, and I didn't build any of that, so don't take that as too strong of a promise.

    Matching constructor calls

    The big question is, how can Python know what to do with Breakfast(spam, eggs, beans)?

    In Haskell or ML, a case statement matches by deconstruction. Breakfast in those languages would be akin to a ('Breakfast', 'spam eggs beans'), so it's really just as easy as tuple unpacking, which Python already does, except with the addition of checking the type.

    But in Python, Breakfast is a class. The fact that its constructor happens to take three arguments, and it happens to assign them to attributes with the same name in the same order (even if that order were recoverable…) is just a lucky accident, not something the language requires, or even encourages.

    Option 1: Matching by construction

    One alternative is to actually call the Breakfast constructor with special values that override __eq__, then trust that Breakfast's __eq__ method will do the right thing. Like this:
    class MatchParam:
            def __eq__(self, other):
                self.matched = other
                return True
    
    So, to match order[1] against, say, Breakfast(0, eggs, _), Python would do this:
    eggs, _ = MatchParam(), MatchParam()
        if Breakfast(0, eggs, _) == order[1]:
            eggs, _ = eggs.matched, _.matched
            # if there's a guard clause, check it here
            # run the suite code here
    
    To give a more complicated example, if Order were a class rather than a tuple, we could match Order(customer, Breakfast(0, eggs, _)) like this:
    egg, _ = MatchParam(), MatchParam()
        customer = MatchParam()
        if Order(customer, Breakfast(0, eggs, _)) == order:
            # etc.
    
    The "as" clause makes this marginally more complicated, but it should be obvious how to make that work.

    Unfortunately, while this works for many simple classes, there are plenty of classes for which it does the wrong thing. Basically, any class that does anything with the arguments in the constructor beyond storing them will probably fail. (Even static type checking will complain that a MatchParam is not a str… although that could be handled by making MatchParam act like a subclass of any type.)

    Worse, some classes would succeed, but only by doing something dangerous or expensive. For example, I don't think you'd want to create a new FileIO(path, 'w') just to match an existing one!

    So, I think this is ultimately a non-starter. Which is a pity, because it's so simple, and so close to what ML-style languages do…

    Option 2: Matching by match protocol

    Any automatic form of matching is going to have the same "FileIO problem". Really, matching has to be something that classes opt in to. The obvious way to do that is to add a match protocol: a new dunder method that's called to get back the constructor arguments, so they can be checked. Like this:
    class Breakfast:
            def __init__(self, spam, eggs, beans):
                self.spam, self.eggs, self.beans = spam, eggs, beans
            def __match__(self):
                return self.spam, self.eggs, self.beans
    
    Now we can match order[1] against Breakfast(0, eggs, _) like this (showing the semantics by transforming to existing Python in a way that could be implemented programmatically, although of course the actual implementation wouldn't do that--which means there's no need to worry about "macro hygiene" problems with those underscore-prefixed names affecting existing variables):
    if isinstance(order[1], Breakfast):
            _0, eggs, _ = order[1].__match__()
                if _0 == 0:
                    # run the suite code here
    
    … except that if _either_ if fails, we want to fall through to the next pattern, not just the outer one. I won't try to show how to write that in any of the examples, just keep it in mind. (Note that if case statements evaluate to a function definition and call, like comprehensions, this is a lot simpler to write. But that has the side effect of not exposing bound variables to the outer scope--which may be something you actually want, but if not, it's not something you want to add incidentally...)

    To match Order(customer, Breakfast(0, eggs, _)):
    if isinstance(order, Order):
            customer, _breakfast = order.__match__()
            if isinstance(_breakfast, Breakfast):
                _0, eggs, _ = _breakfast.__match__()
                if _0 == 0:
                    # etc.
    

    And to match Order(0, Breakfast(0, eggs, _) as breakfast) if eggs:
    if isinstance(order, Order):
            _0, _breakfast = order.__match__()
                if _0 == 0:
                    if isinstance(_breakfast, Breakfast):
                        _0, eggs, _ = _breakfast.__match__()
                            if _0 == 0:
                                breakfast = _breakfast
                                if eggs:
                                    # suite here
    
    This transformation shows exactly why pattern-matching can be attractive. It's just syntactic sugar, but it's syntactic sugar for code you wouldn't want to write explicitly, and you certainly wouldn't want to read.

    The obvious downside here is that most simple classes won't be matchable. But if matching has to be opt-in, that seems pretty much unavoidable.

    Keyword arguments?

    One obvious problem with __match__ returning a tuple is that it can't match keyword arguments. Which means classes with keyword-only parameters in their constructors can't be matched at all.

    I think this could be solved by returning something like an inspect.ArgSpec instead of a tuple, but I haven't investigated to make sure this is feasible. Anyway, it should be obvious how to fit it into the case statement syntax; it's only the semantics that need to be worked through.

    Can we remove the boilerplate?

    Clearly, we could trivially add __match__ support to namedtuple, which helps in cases where you really do just want an ML-like record type and were already going to used namedtuple for it.

    It would be easy to define a @simpleclass decorator that generates a __match__ automatically by, e.g., inspecting the parameter names to __init__. Or, alternatively, the decorator could just take the attribute names and generate both the __init__ and __match__ methods for you.

    There could also be special decorators for dealing with __slots__ classes, classes with only @property attributes, etc.

    But somehow, after the first one, each seems increasingly less Pythonic. It might be appropriate for certain frameworks; you might even want to build a framework with base classes (and/or metaclasses) that do this automatically, instead of using a decorator. (For example, a Django model seems like it could define matching in terms of fields—except for the fact that you generally can't actually construct them directly, you have to call a create method… but there's no reason that Model.objects.create couldn't itself be a matchable type, rather than Model.)

    Can we leverage the copy/pickle protocol?

    The match protocol is really doing the same thing as the copy/pickle protocol: it's asking a class instance for its values, in a form that could be used to construct an equal instance. Could this be leveraged?

    The copy/pickle protocol is very flexible. While classes can implement __getnewargs__ or __getnewargs_ex__ to provide constructor arguments, they can also implement __getstate__ and __setstate__, or even __reduce_ex__ and a custom constructor-from-reduction function. And it's not clear how anything other than __getnewargs_ex__ could be used for matching unless the class also implemented __matchstate__ or a custom matcher-from-reduction function.

    Still, this might be a good answer: Make __getnewargs__ and __getnewargsex__ automatically provide match information, and provide a way for the other alternatives to optionally specify match information (and if they don't, the class just can't be matched).

    However, there aren't all that many classes that use __getnewargs__. Most classes that are simple enough to not need __getstate__ or __reduce__ just rely on the default pickler in object.__reduce_ex__, which of course stores the __dict__ and a _reconstructor function that calls __new__ but not __init__ and then updates the __dict__, which won't be of any use to matching.

    And encouraging people to implement __getnewargs__ if they want to match means that they will no longer get the default pickler. For a class that does expensive or dangerous work in the constructor, this means copy.copy will now do that expensive or dangerous work—so we're right back to the FileIO problem.

    So, I think that, despite the obvious similarities between the two protocols, they have to remain separate. (Maybe in a new Python-like language with both ideas there from the start, it would be a different story.)

    Beyond case

    In languages that rely heavily on pattern matching, it can often be used in two other places: function definitions, and let expressions. The equivalents of both are feasible in Python.

    Function definitions

    In Haskell, you can define a function as, in effect, a series of separate definitions that use patterns as their arguments.

    Putting it in Python terms:
    def fact(0):
            return 1
    
        def fact(n):
            return n * fact(n-1)
    
    In Haskell, this is explicitly syntactic sugar for declaring a single function with a case statement. In other words, the compiler turns that into:
    def fact(_n):
            case _n:
                of 0:
                    return 1
                of n:
                    return n * fact(n-1)
    
    Note that this is different from overloading by type in languages like C++ (or in Python's functools.singledispatch or any of the multiple dispatch modules on PyPI), because it's overloading by structure (or value, in the case of constants). But it can be used to overload by type just by matching different type constructors.

    In Python, we would need some way to mark these as pattern overloads, as opposed to redefinitions, but the answer to that is obvious, and already used by singledispatch: use a decorator:
    @pattern
        def fact(0):
            return 1
    
        @fact.pattern
        def fact(n):
            return n * fact(n-1)
    
    (The specific "@fact.pattern" syntax isn't absolutely necessary; "@pattern" alone could look up __name__ and find the right thing to attach to, and some of the alternative overloading modules on PyPI do this. I've just written it this way to parallel singledispatch.)

    This would need pretty minimal language support: a function definition can take a single case pattern instead of a normal argument list. Normally, such a function is not callable (it always raises a TypeError), but the pattern decorator can inspect it to build something callable out of it. I'll leave the implementation of the pattern function as an exercise for the reader.

    Assignments

    Haskell of course doesn't have assignments; it's a pure (meaning purely-immutable) functional language. And let expressions aren't the same as assignment statements. So in this case, rather that start off with the parallel and then show how it maps to Python, I'll just show the Python:
    Breakfast(spam, eggs, beans) = order[1]
    
    It should be obvious what this means, and how to implement it given the above.

    What if there are constants?
    Breakfast(0, eggs, beans) = order[1]
    
    That matches eggs and beans if order[1].spam == 0, and raises MatchException("0 does not match 42") if order[1].spam == 42.

    There's no ambiguity here. A call is not a valid target today, nor is a literal. A target list is a valid target, but since a pattern list is a superset of a target list, this isn't a problem. (At least I'm pretty sure there aren't any assignments that are valid today but that would have a different meaning if treated as a pattern assignment.)

    And yes, you could even make as clauses and if clauses work here for consistency, although I don't think they'd be useful that often.

    In fact, the easiest way to add pattern-matching assignments is to redefine assignments in terms of case statements, so target_list = expression is equivalent to:
    case expression:
            of target_list:
                pass
            else:
                raise ValueError
    

    What about adding ADTs to Python?

    There are really two parts of ADTs: record types, which are basically tuples with named elements, and sum types, which are tagged unions with named values. The first is exactly the same thing as collections.namedtuple, which Python already has. Adding pattern matching for namedtuples would be trivial (with the match protocol described above, or otherwise). The second, Python doesn't have—but enum.Enum is halfway there. The only part that's missing is the ability to attach a value to an enumeration member (and, maybe, statically specify a type for each member). This is exactly the approach that Swift takes. In Python terms, imagine you could do this:
    class Barcode(enum.Enum):
            UPCA = 1
            QRCode = 2
    
        >>> barcode = Barcode.UPCA(17, 23, 42, 42)
        >>> barcode
        <Barcode.UPCA: 1 = (17, 23, 42, 42)>
        >>> barcode.value
        (17, 23, 42, 42)
    
    For more on this idea, see my separate post ADTs for Python.

    If we had this, would it be worth adding pattern matching only for Enum and namedtuple? I don't think so. If neither of these types is common enough to even be a builtin; adding new syntax just for them seems silly.

    However, the ability to match on these types in particular--as well as any other types that opt in--might be a good argument for adding a general match protocol, as described above.

    Is any of this a good idea?

    I'm not actually sure it is.

    A big part of the reason pattern matching is so important in functional languages is that it's declarative. Like let, it's an expression that binds values within a subexpression, and has the value of that subexpression.

    There doesn't seem to be any good reason you couldn't use pattern matching to bind names applicatively for the rest of the scope, or to execute side effects, any more than there's a good reason that for statements and if statements don't make sense. But it's also hard to find a nice paradigm case from a pattern-matching language that isn't better rewritten in Python in a more traditional iterative style. Someone would need to find a good use case that really is more readable as a pattern decomposition.

    It's possible that things like singledispatch and the proposed new static type annotations will gradually change Python into a language where pattern matching is a more obvious win (as the addition of iterators and generators changed it into a language where other features borrowed from Haskell and friends turned out to be a bigger win), but it's hard to anticipate that. Certainly adding features like not-quite-sum union types and not-quite-parametric generic types looks like a step in that direction, but it's way too early to tell how big of a step, and how far Python will go, and what it will look like when it gets there.

    Szymon Pyzalski's proposal

    After reading Szymon Pyzalski's proposal on python-ideas, there are a few ideas worth stealing from him, or at least considering.

    No targets in patterns

    Szymon solves the tricky problem with distinguishing values from targets in patterns by just not having targets. Instead, the "as" clause handles the existing Python tuple decomposition, which handles some simple cases, and when that isn't sufficient, the match object can dump a bunch of variables into the case statement's scope. I don't like that, for multiple reasons, but let's not get into those here. However, it does make some of his other ideas fit in a lot more simply, so keep that in mind.

    "for" instead of "case"

    Trying to pick a good keyword for an addition to Python is always tricky. Adding a new keyword means that any code that used that keyword as an identifier is now illegal, and I'm sure there's code out there that has variables named "case" or "switch" or "match" or anything else you might pick. On the other hand, reusing an existing keyword runs the risk of making the syntax ambiguous to the parser and/or to a human. Szymon uses "for", and I think you can make that unambiguous to the parser... but it looks odd to a human. Especially if the pattern (or the iterable, in an existing loop statement) is long and complex, so you have to hunt for an "in" (and one not inside another expression!) to tell which type of "for" statement it is. I think introducing "case" is, all things considered, less bad, but of the existing keywords, "for" does seem to be the best bet.

    No keyword for subclauses

    In his proposal, there's no "of" or "else" keyword, you just put the pattern followed by a colon. (For the "else" case, you use an ellipsis followed by a colon.) And this isn't just more concise, it looks cleaner.

    I think this means the parser has to actually treat these as clauses of the case statement, instead of (as I did) treating them as separate statements, then having a post-parser rule (equivalent to the "break must be inside a loop" rule) to only allow them inside a case, which would be harder to implement--but then again, the way I implemented it is clearly a hack, so if his way is doable, that's no big loss.

    Matching on types

    There are a lot of cases in Python where you'd like to match by type, without decomposing. In fact, going back to the duality between function overloading and pattern matching, all of the examples for single and multiple dispatch are also potentially examples for type-based pattern matching. Here's the motivating example from Matthew Rocklin's multipledispatch library converted to pattern matching:
    def add(*args):
            case args:
                of (x: int, y: int):
                    return x+y
                of (x, y):
                    return "%s + %s" % (x, y)
    
    Of course this requires another extension to the language, being able to use function-style type annotations in assignment target lists, one that we're not likely to get any time soon. (Note that PEP 484 explicitly puts off even considering a syntax like "x: int = 3" for future consideration.) Then again, we're not likely to get pattern matching any time soon, either, so maybe that's not a problem.

    Without that, by adding the rule that values match pattern values if they're equal or if the pattern value is a type and the value is an instance of that type, you could write it as "of (@int, @int) as x, y:". But that only works in simple cases (again, where tuple decomposition works), and it's more verbose, and it requires the horrible "@" syntax for disambiguating a value from a target (which I hoped would be so rarely needed that it wouldn't matter how ugly it is, but it seems like this case would be pretty common if it were allowed).

    Regular expressions

    Many newer scripting languages have C-style case statements that can match regular expressions. Szymon includes something similar in his proposal, and takes it even farther, by providing a way to decompose the regular expressions—<P> named subexpressions are like variable names. For example:
    case xy:
            of re.compile(r'(?P<x>[0-9]+)-(?P<y>[0-9]+)'):
                return range(x, y)
    
    This works with his design of dumping variable names into the scope. But with ML-style decomposition with targets, it could only work if regexp syntax were part of the language syntax--as in, say, Perl.

    Also, I'm not entirely sure this is a good idea anyway. There's a reason regexps aren't embedded in the language in Python as they are in Perl. While it makes a few things harder or more verbose, it also discourages people from using regexp solutions to non-regexp problems. And, in fact, Szymon's motivating example seems to be exactly one of the cases where regexps are an attractive nuisance, not an actual solution: he wants to match two numbers as either a tuple, a dict of values named x and y, or a string, separated by a hyphen. Like this (translated to my case ... of syntax and slightly simplified):
    case point:
            of (Number, Number) as x, y: return x+y
            of {'x': Number, 'y': Number}: return x+y
            of re.compile(r'(?P<x>[0-9]+)-(?P<y>[0-9]+)'): return x+y
    
    That looks nice, except that the third one doesn't actually match two numbers, it matches two Arabic numerals representing nonnegative integers. It won't work for "-5-3", or "3.1-3.5", or "1+2j-1-2j". The first two, on the other hand, work for any Number type--even one I didn't know about (say, if the user has imported a quaternion library).

    It's also worth noting that he presents the first two as parallels to "a tuple (x, y)" and "a mapping {'x': x, 'y': y}" (in his proposal, other objects are also matched by effectively converting themselves into a tuple or a mapping, so this covers more than it appears to), and it's pretty clear that (with the addition of "where x and y are Numbers") the Python pattern is an exact translation of the for-humans pattern. But the third is presented as a parallel to "a string "{x}-{y}.format(x, y)", which not only doesn't actually match (as explained above), but is not at all readable as doing so.

    Can this be hacked into Python as a module?

    MacroPy comes with pattern matching as an example of the kinds of things you could add to Python using its macro facility. It essentially uses the "decorator to define both __init__ and __match__" idea above, and only classes defined with that decorator can be matched, and then it (ab)uses the with and if statements, but it seems pretty nice.

    If you don't want a full macro-processing import hook, there are hacks that can accomplish some of the same by, e.g., ignoring the code and inspecting the function's source, or even decompiling the function. Or, if you're willing to give up some niceties, you can write things a bit more clumsily and use a portable (non-hacky) decorator to convert. Or, with even more clunky syntax, you can do it with just a special type with overloaded operators. There are lots of implementations using every one of these tricks; Grant Jenks' PyPatt helpfully provides a quick survey of them (as well as being a good one in itself).
    2

    View comments

  3. Weak typing

    A language with weak typing is one where programs can escape the type system. Another way to describe it is that values can change types.

    For example, in C, I can create a pointer to characters, then tell the compiler I want to use it as a pointer to integers:
        char sz[] = "abcdefg";
        int *i = (int *)sz;
    
    On a little-endian platform with 32-bit integers, this makes i into an array of the numbers 0x64636261 and 0x00676665. In fact, you can even cast pointers themselves to integers (of the appropriate size), or vice-versa:
        char *spam = (char *)0x12345678
        spam[0] = 0;
    
    And this means I can overwrite memory anywhere in the system.*

    * Of course modern OS's use virtual memory and page protection so I can only overwrite my own process's memory, but there's nothing about C itself that offers such protection, as anyone who ever coded on, say, Classic Mac OS or Win16 can tell you.

    Traditional Lisp allowed similar kinds of hackery; on some platforms, double-word floats and cons cells were the same type, and you could just pass one to a function expecting the other and it would "work".

    Most languages today aren't quite as weak as C and Lisp were, but many of them are still somewhat leaky. For example, any OO language that has an unchecked "downcast",* that's a type leak: you're essentially telling the compiler "I know I didn't give you enough information to know this is safe, but I'm pretty sure it is," when the whole point of a type system is that the compiler always has enough information to know what's safe.

    * A checked downcast doesn't make the language's type system any weaker just because it moves the check to runtime. If it did, then subtype polymorphism (aka virtual or fully-dynamic function calls) would be the same violation of the type system, and I don't think anyone wants to say that.

    Very few "scripting" languages are weak in this sense. Even in Perl or Tcl, you can't take a string and just interpret its bytes as an integer.* But it's worth noting that in CPython (and similarly for many other interpreters for many languages), if you're really persistent, you can use `ctypes` to load up `libpython`, cast an object's `id` to a `POINTER(Py_Object)`, and force the type system to leak. Whether this makes the type system weak or not depends on your use cases—if you're trying to implement an in-language restricted execution sandbox to ensure security, you do have to deal with these kinds of escapes…

    * You can use a function like struct.unpack to read the bytes and build a new int out of "how C would represent these bytes", but that's obviously not leaky; even Haskell allows that.

    Implicit conversion

    Implicit conversion is really a different thing from a weak or leaky type system.

    Every language, even Haskell, has functions to, say, convert an integer to a string or a float. But some languages will do some of those conversions for you automatically—e.g., in C, if you call a function that wants a float, and you pass it in int, it gets converted for you. This can definitely lead to bugs with, e.g., unexpected overflows, but they're not the same kinds of bugs you get from a weak type system.

    C isn't really being any weaker here; you can add an int and a float in Haskell, or even concatenate a float to a string, you just have to do it more explicitly.

    Or, using the alternate definition, the value's type isn't changing from int to float, the program is creating a new float by calling a conversion function on the int, but the original int is still sitting around in whatever variable (or temporary location) it was always in.

    With dynamic languages, the whole idea of implicit conversion is pretty murky. There's no such thing as "a function that wants a float" in Python or Perl. But there are overloaded functions that do different things with different types, and there's a strong intuitive sense that, e.g., adding a string to something else is "a function that wants a string".

    In that sense, Perl, Tcl, and JavaScript appear to do a lot of implicit conversions ("a" + 1 gives you "a1"), while Python does a lot fewer ("a" + 1 raises an exception, but 1.0 + 1 does give you 2.0*).

    * Actually, in modern Python, that can be explained in terms of OO subtyping, since `isinstance(2, numbers.Real)` is true. I don't think there's any sense in which `2` is an instance of the string type in Perl or JavaScript… although in Tcl, it actually is, since _everything_ is an instance of string.

    But it's pretty hard to put that into anything beyond a loose, intuitive sense. Why shouldn't there be a + function that takes a string and an int? Obviously (string, int)->string is a perfectly valid type for a function to have, since it's the type of, e.g., the index function.

    Another meaning for strength

    There's another, completely orthogonal, definition of "strong" vs. "weak" typing, where "strong" means powerful/flexible/expressive.

    For example, Haskell lets you define a type that's a number, a string, a list of this type, or a map from strings to this type, which is a perfect way to represent anything that can be decoded from JSON. There's no way to define such a type in Java. But at least Java has parametric (generic) types, so you can write a function that takes a List of T and know that the elements are of type T; other languages, like early Java, forced you to use a List of Object and downcast. But at least Java lets you create new types with their own methods; C only lets you create structures. And BCPL didn't even have that. And so on down to assembly, where the only types are different bit lengths.

    So, in that sense, Haskell's type system is stronger than modern Java's, which is stronger than earlier Java's, which is stronger than C's, which is stronger than BCPL's.

    So, where does Python fit into that spectrum? Again, dynamic types make this murky.

    In many cases, duck typing allows you to simulate everything you can do in Haskell, and even some things you can't; sure, errors are caught at runtime instead of compile time, but they're still caught. However, there are cases where duck typing isn't sufficient.

    For example, in Haskell, if you have a doubly-optional value (e.g., from calling zip_longest* on a list of optional values), you can distinguish None (filled in by zip_longest) from Just(None) (an actual None value from the original list). In Python, they're both None—that's why zip_longest needs a fillvalue parameter, so you can pass an alternate sentinel.

    * I'm using Python terminology and syntax instead of Haskell here, so people who don't know Haskell can understand. I assume people who do know both languages can figure out what I mean.

    For another example, in Haskell, the concept of an empty list of integers makes perfect sense; in Python, an empty list is an empty list. So, in Haskell, you could decide that reducing + over that list should return 0*; in Python, you can't.

    * In fact, Haskell doesn't let you do this; if you call the reduce function with no start value on an empty list, you get an error. But its type system is powerful enough that you _could_ make this work, and Python's isn't.

    On top of that, in many cases, often arguable whether Python really is simulating a stronger type system. For example, in Haskell, if you have a function that returns either a string or nothing, it returns Maybe String; to access the string, you have to use Just on it, even after testing for Nothing. Normally, you do this with pattern matching:
        case spam:
            of None:
                print('I got nothing')
            of Just(s):
                print('I got {}'.format(s))
    
    In Python, the same function would return either None, or a str; to access the str, you just access it directly after testing for None, no need to pull it out of a Just. So, does this mean Python is simulating disjunctive types, and its if not None is doing implicit lowering? On the one hand, you get effectively the same error in Python trying to call, e.g., len(spam) if spam is None as you'd get in Haskell. On the other hand, if you call, say, __sizeof__ or __lt__ without testing for None, it will work in Python, and the equivalent would definitely not work in Haskell.

    I think this is a matter of degrees; the cases where Python gets it "wrong" according to Haskell are just more examples of where duck typing isn't as powerful as Haskell's type system. (And, conversely, the places where Python gets it "right" are examples of where Python is more concise and readable without giving up any power.)
    0

    Add a comment

  4. The official tutorial does a great job explaining list comprehensions, iterators, generators, and generator expressions at a high level. Since some people don't want to read even that much, I wrote a post on comprehensions for dummies to summarize it.

    And if you want to know the lowest level nitty gritty of how they work, there's documentation to point you to the right place, and then the code is relatively readable, at least if you understand C well and know the basics of how CPython works.

    But what if you're somewhere between the two extremes? People occasionally ask this on StackOverflow, and the questions invariably get closed because "explain to me how this works" is not a good question for the SO format, but it's a perfectly good question for, say, a blog.

    List comprehensions


    Under the covers, Python compiles the guts of your list comprehension into a function, and then compiles the comprehension itself into a call to that function. I'll oversimplify it a bit, but the basic idea is this:
        [random.choice(string.ascii_lowercase + " ") for i in range(n)]
    
    … turns into:
        def _comprehension(source):
            result = []
            for i in source:
                result.append(random.choice(string.ascii_lowercase + " "))
            return result
        _comprehension(iter(range(n)))
    

    Of course the function isn't defined in the middle of the expression, it's sort of but not quite as if it were defined in the previous statement (it is definitely defined in the local scope, of course, so it can access your variables as closure cells). And it doesn't have a visible name like _comprehension. Also, because comprehensions are limited in form, it can optimize things a bit. But that's the basic idea.

    Set and dict comprehensions

    A set comprehension looks exactly like a list comprehension, except it starts with result = set() and does add instead of append.

    A dict comprehension is basically the same, but the function it calls on each colon-separated key-value tuple doesn't really exist, so just think of it as result[key] = value.

    Generator expressions

    A generator expression looks a lot like a list comprehension, except that it's a generator function instead of a regular function. In other words:
        def _comprehension(source):
            for i in source:
                yield random.choice(string.ascii_lowercase + " ")
        _comprehension(iter(range(n)))
    
    In fact, generator expressions (even though they came later than list comprehensions in Python history) are really the core concept; you can almost build everything else on top of them:
        a = [i*2 for i in range(10)]
        b = list(i*2 for i in range(10))
        assert a == b
        a = {i*2 for i in range(10)}
        b = set(i*2 for i in range(10))
        assert a == b
        a = {i: i*2 for i in range(10)}
        b = dict((i, i*2) for i in range(10))
        assert a == b
    
    But the other comprehensions aren't quite pure syntactic sugar.

    First, they're up to 40% faster (in the worst case, where you're basically doing nothing).

    Second, raising StopIteration just ends a generator expression early, but in a list/set/dict comprehension it ends the whole containing expression; you don't get a list up to the point where you raised StopIteration, you get nothing. (This difference is apparently actually a bug, but not one worth fixing; see my post Can you optimize list(genexp) for further details on what it would take to fix it.) So, in the rare cases where you need to handle StopIteration properly (which I personally have only come across once, and it was write writing code to experiment with an idea for changing the language, not real code…), you have to use list(…), not […].

    Finally, if you're writing code that's backward compatible with Python 2.x, generator expressions worked the same way they do in 3.x, but the other comprehensions didn't; they were inlined into the current function, meaning they leaked the iteration variable into the outer scope (overwriting any existing variable with the same name). And if you're still using Python 2.6 or earlier, you don't have set or dict comprehensions, so you have no choice but to use set(…) or dict(…).

    Disassembling comprehensions

    If you want to dive in further yourself, you might think about disassembling the comprehension. But if you try that, you're just going to be disassembling the code that builds and calls a function, not the actual internal comprehension function's code.

    Of course you run into that problem with any local function, and you can get around that easily by having the outer function return the local function (or its disassembly, or whatever), but that doesn't work here, because you don't have a name for the local to return (and it's not in locals() or anywhere else introspectable).

    Well, you can always put a comprehension in a module and disassemble everything in the module.

    Or you can use this quick&dirty hack:
        >>> frame = [sys._getframe() for _ in range(1)][0]
        >>> dis.dis(frame.f_code)
          1           0 BUILD_LIST               0
                      3 LOAD_FAST                0 (.0)
                >>    6 FOR_ITER                18 (to 27)
                      9 STORE_FAST               1 (_)
                     12 LOAD_GLOBAL              0 (sys)
                     15 LOAD_ATTR                1 (_getframe)
                     18 CALL_FUNCTION            0 (0 positional, 0 keyword pair)
                     21 LIST_APPEND              2
                     24 JUMP_ABSOLUTE            6
                >>   27 RETURN_VALUE
    
    If you know Python bytecode, you can see some of the optimizations I mentioned—there's no GET_ITER before the FOR_ITER, because the source is always passed in as an iterator; it uses special BUILD_LIST and LIST_APPEND bytecodes instead of calling the list constructor and looking up and calling its append method and then having to push the list again; it jumps straight from the loop to the RETURN_VALUE rather than doing the extra POP_BLOCK work and pushing some value, because the list set up in BUILD_LIST is already guaranteed to be on top of the stack.

    You can probably guess how set and dict comprehensions work: they do BUILD_SET and BUILD_MAP, and SET_ADD and MAP_ADD instead of the list equivalents. And MAP_ADD requires two things on the stack instead of one.

    Generator expressions are similar. No BUILD_LIST, a YIELD_VALUE followed by POP_TOP instead of LIST_APPEND, and they have to LOAD_CONST None at the end because they have nothing to return, but that's the only differences.
    1

    View comments

Blog Archive
About Me
About Me
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.