About a year and a half ago, I wrote a blog post on the idea of adding pattern matching to Python.

I finally got around to playing with Scala semi-seriously, and I realized that they pretty much solved the same problem, in a pretty similar way to my straw man proposal, and it works great. Which makes me think that it's time to revisit this idea. For one thing, using Scala terminology that people may be familiar with (and can look up) can't hurt. For another, we can see whether any of the differences between Python and Scala will make the solution less workable for Python.

Extractors


In my post, I described the solution as a matching protocol. A class that wants to be decomposable by pattern matching has to implement a method __match__, which returns something like a tuple of values, or maybe an argspec of values and optional values, that the pattern-matching code can then match against and/or bind to the components of a pattern.

For simple cases, this could be automated. Both namedtuple and Namespace could define a __match__, and we could write decorators to examine the class in different ways--__slots__, propertys, copy/pickle, or even runtime introspection--and build the __match__ for you. But when none of these simple cases apply, you have to implement it yourself.

In Scala, the equivalent is the extractor protocol. A class that wants to be decomposable by pattern matching supplies an associated extractor (normally a singleton object, but that isn't important here), which has an unapply method that returns an optional value or tuple of values, as well as an optional apply method that constructs instances of the class from the same arguments. A case class (a simple kind of type that's basically just a record) automatically gets an extractor built for it; otherwise, you have to build it yourself.

In the simplest case, this is no different from just using __match__ for unapply and, of course, the existing __init__ for apply. But it's actually a lot more flexible.

  • You can write multiple extractors that all work on the same type.
  • You can add an extractor for an existing type, without having to monkeypatch the type. For example, you could extract the integer and fraction parts of a float.
  • You can write an extractor that doesn't match all values of the type.

Example


Daniel Westheide's great Neophyte's Guide to Scala shows some of this in action. I'll borrow one of his examples, with minor variations:

trait User {
  def name: String
  def score: Int
}
class FreeUser(val name: String, val score: Int, val upgradeProbability: Double)
  extends User
class PremiumUser(val name: String, val score: Int) extends User

object FreeUser {
  def unapply(user: FreeUser): Option[(String, Int, Double)] =
    Some((user.name, user.score, user.upgradeProbability))
}
object PremiumUser {
  def unapply(user: PremiumUser): Option[(String, Int)] = Some((user.name, user.score))
}

val user: User = new FreeUser("Daniel", 3000, 0.7d)

user match {
  case FreeUser(name, _, p) =>
    if (p > 0.75) name + ", what can we do for you today?" else "Hello " + name
  case PremiumUser(name, _) => "Welcome back, dear " + name
}

If you're not used to Scala's syntax, this might be a bit hard to read, but it's not too hard to muddle through. A trait is like an ABC, a class is like a normal class, and an object defines a singleton class and the one instance of it. So:

First, we define a class hierarchy: FreeUser and PremiumUser are both subclasses of the abstract base class User, and FreeUser adds a new attribute.

Next, we define two extractors. The fact that they're named FreeUser and PremiumUser, just like the classes, is convenient for readability, but it's the type annotations on their unapply methods that makes them actually work on those types respectively.

Then we construct a User, who happens to be a free user with an 0.7 probability of upgrading.

Then we pattern-match that user, first as a free user, then as a premium user if that fails. Here, the syntax works like my proposal, except for minor spelling differences (match for case, case for of, braces for indents, => for :). But, instead of checking isinstance(user, FreeUser) and then calling instance.__match__ and binding to name, _, p, Scala tries to call FreeUser.unapply(user), which works, and binds to name, _, p. The result is the same, but the way it gets there is a little more flexible (as we'll see).

And then, inside the free user case, we just do an if-else, so the result is "Hello Daniel".

Another example


In case it isn't obvious what the optional-value bit is for, here's another example that makes use of that, as well as applying an extractor to a builtin type without having to monkeypatch it. This is a pretty silly example, but it's in the official Scala tutorial, so...

object Twice {
    def unapply(x: Int): Option[Int] = if (x%2 == 0) Some(x/2) else None
}

val i: int = 42
i match {
  case Twice(n) => n
  _ => -1
}

Here, Twice.unapply(42) returns Some(21), so the case Twice(n) will match, binding n to 21, which we'd return.

But if we tried the same thing with 23, then Twice.unapply(23) returns None, so the case Twice(n) would not match, so we'd hit the default case, so we'd return -1.

As for the optional values, I'll get to that below.

Can we do this in Python?


Some of the details obviously aren't going to fit into Python as-is.

We could easily add a @singleton class decorator that lets us make the class anonymous and return a singleton instance pretty easily. But we still can't use the same name as the class without a conflict in Python (the instance would just unbind the type). And really, do we need to define a class here just to wrap up a function in a method (which is, or might as well be, a @staticmethod, since it doesn't touch self, which has no state to touch anyway)? I think it would be more pythonic to just have decorated functions. (If you disagree, it's pretty trivial to change.)

Python normally uses dynamic (per-instance) attributes on classes. You can use __slots__ or @property, and you can even define an ABC that makes those abstract, matching Scala's traits, but I think it's more Pythonic to just skip that here. (I'm pretty sure it's orthogonal to everything related to pattern matching and extracting; it just means our most Pythonic equivalent example isn't exactly like the Scala example.)

Python doesn't have optional types. It does have "implicit nullable types" by just returning None, but since None is often a valid value, that doesn't do any good. Of course the answer here is exceptions; everywhere that Scala returns Option[T], Python returns T or raises, so why should this be any different?

In my previous proposal, we used AttributeError (or a new subclass of it, MatchError) to signal a failed match (partly because you get that automatically if __match__ doesn't exist), so let's stay consistent here. We can even make a special unapply_simple decorator that lets us return None and have it turned into an MatchError when we know it isn't a valid value.

Finally, Python doesn't have static typing. Is that necessary here? Let's try it and see.

So, using my earlier case syntax:

class User:
    pass

class FreeUser(User):
    def __init__(self, name: str, score: int, upgrade_probability: float):
        self.name, self.score, self.upgrade_probability = name, score, upgrade_probability

class PremiumUser(User):
    def __init__(self, name: str, score: int):
        self.name, self.score = name, score

@unapply
def free_user(user):
    return user.name, user.score, user.upgrade_probability

@unapply
def premium_user(user):
    return user.name, user.score

user = FreeUser("Daniel", 3000, 0.7d)

case user:
    of free_user(name, _, p):
        if p > 0.75:
            return '{}, what can we do for you today?'.format(name)
        else:
            return 'Hello {}'.format(name)
    of premium_user(name, _):
        return 'Welcome back, dear {}'.format(name)

@unapply_simple
def twice(x):
    if x%2 == 0: return (x/2,)

i = 42
case i:
    of twice(n): print(n)
    else: print(-1)

There are a few problems here, but I think they're easily solvable.

My earlier proposal matched Breakfast(0, eggs, _) by first checking isinstance(breakfast, Breakfast), and then calling Breakfast.__match__; here, there's no type-checking at all.

As it turns out, that free_user will fail to match a PremiumUser because user.upgrade_probability will raise an AttributeError--but the other way around would match just fine. And we don't want it to. But with nothing but duck typing, there's no way that Python can know that premium_user shouldn't match a FreeUser. (The only reason a human knows that is that it's implied by the names.)

So, is the lack of static typing a problem here?

Well, we can do the exact same isinstance check inside the unapply decorator; just use @unapply(FreeUser) instead of just @unapply. This seems perfectly reasonable. And it's no more verbose than what you have to do in Scala.

Or we could even have @unapply check for annotations and test them. It's even closer to what you do in Scala (and in most other FP/hybrid languages), and, if we're going to encourage type annotations in general, why not here? On the downside, PEP 484 specifically says that its annotations don't affect anything at runtime. Also, not every PEP 484/mypy static annotation will work dynamically (e.g., Iterable[int] can only check for Iterable). But I don't think that's a problem. The decorator version seems more conservative, but either one seems reasonably Pythonic if we wanted to go with it.

Can the two proposals be merged?


It's pretty nice that the earlier proposal uses the type name as the pattern name, especially for simple types like namedtuple.

But it's also pretty nice that the Scala-influenced proposal allows you to have multiple pattern extractors for the same type.

Scala gets the best of both worlds by letting you give one of the extractors the same name as the type (and by giving you one automatically for case classes).

We could get the best of both worlds by just implementing both. The relevant bit of case-statement logic from my previous proposal did this to match of Breakfast(0, _, eggs)::

if isinstance(case_value, Breakfast):
    try:
        _0, _, eggs = case_value.__match__()
        if _0 != 0: raise MatchError(_0, 0)
    except AttributeError:
        pass # fall through to next case
    else:
        case body here

To extend this to handle extractors:

if isinstance(case_value, Breakfast):
    # ... code above
elif isinstance(Breakfast, unapply_function):
    try:
        _0, _, eggs = Breakfast(case_value)
    # ... rest of the code is the same as above

We need some way to avoid calling Breakfast when it's a class (again, consider the FileIO problem from my previous post). That's what the second isinstance check is for. That unapply_function type is what the @unapply decorator (and any related decorators) returns. (It's a shame that it can't inherit from types.FunctionType, because, as of Python 3.6, that's still not acceptable as a base type. But it can implement __call__ and otherwise duck-type as a function. Or, alternatively, just use a function and attach a special attribute to it in the decorator.)

In fact, we can even allow externally defining decomposition without monkeypatching just by having a special decorator that registers an extractor as the type-named extractor for a type, maybe @unapply_default(Spam). That could create Spam.__match__ as a wrapper around the function, but it could just as easily store it in a registry that we look up if isinstance(case_value, Spam) but Spam.__match__ doesn't exist. (The latter allows us to avoid monkeypatching existing classes--which is obviously important for classes like int that can't be patched... but then do you want coders to be able to create a new pattern named int? An extractor that can be applied to int, sure, but one actually called int?)

Other features of Scala pattern matching


Features already covered


There are some Scala features I didn't mention here because they're equivalent to features I already covered in my previous post:
  • Scala has as clauses with the same semantics. (they're spelled NEWNAME @ PATTERN instead of PATTERN as NEWNAME, which I think is less readable).
  • Scala has if guard clauses with the same semantics (including being able to use values bound by the decomposition and the as clause), and almost the same syntax.
  • Scala has value definition and variable definition pattern syntax, with the same semantics and nearly the same syntax as my pattern-matching assignment (not surprising, since their syntax was inspired by Python's sequence unpacking assignment, and mine is an extension of the same).

Pattern matching expressions


Scala has a match expression, rather than a statement.

That shouldn't be too surprising. As I mentioned last time, most languages that make heavy use of pattern matching use an expression. But that's because they're expression-based languages, rather than statement/expression-divide languages. (Traditionally, functional languages are expression-based either because they encourage pure-functional style, or because they're heavily based on Lisp. Scala isn't being traditional, but it's expression-based following the same trend as JavaScript and Ruby.)

In Python (unlike traditional functional languages, Scala, or Ruby), if is a statement--but then Python also has a more limited form of if that can be used in expressions. Could the same idea apply here? Maybe. But consider that Python had an if statement for many years before the stripped-down expression was added, and the statement is still used far more often.

I think we'll want pattern-matching statements for anything non-trivial, and we won't know what (if anything) we'd want from a stripped-down limited pattern-matching expression until we get some experience with pattern-matching statements.

Predicate extractors


Scala allows extractors that return a boolean instead of an optional tuple. This allows you to match without deconstructing anything:

object PremiumCandidate {
  def unapply(user: FreeUser): Boolean = user.upgradeProbability > 0.75
}

user match {
  case PremiumCandidate() => initiateSpamProgram(user)
  case _ => sendRegularNewsletter(user)
}

This is obviously easy to add to Python:

@unapply
def premium_candidate(user: FreeUser):
    return user.upgrade_probability > 0.75

case user:
    of premium_candidate:
        initiate_spam_program(user)
    else:
        send_regular_newsletter(user)

In Scala, the different logic is driven off the static return type, but we can just as easily use the syntactic difference: If we get an unapply_function rather than a (syntactic) call of an unapply_function, we call it and if it returns something truthy it's a match; if it returns something falsey or raises AttributeError it doesn't.

Scala also lets you use predicate extractors together with its equivalent of the as clause as a way of type-casting the value if it passes a type-check. For example, if initiateSpamProgram requires a FreeUser, rather than just a User, you could write:

user match {
  case freeUser @ PremiumCandidate() => initiateSpamProgram(freeUser)
}

We can't do that in Python, but then we don't have any need to. At runtime, initiate_spam_program will probably just duck-type; if not, it'll isinstance or similar, and it will see that user is indeed a FreeUser. For mypy or other static type checkers, there's no reason the checker can't infer that if user passes premium_candidate(user: FreeUser), it's a FreeUser; not being able to do that is just a limitation of Scala's typer (user can't be defined as a User but then be inferred as a FreeUser within the same expression). We don't need to copy that limitation just so we can try to find a workaround for it.

Extractors with apply


You'll notice that I mentioned apply way back at the top, but I only implemented unapply in the above examples. So, let's show an example of apply:

object Twice {
  def apply(x: Int): Int = x * 2
  def unapply(x: Int): Option[Int] = if (x%2 == 0) Some(x/2) else None
}

val i = Twice(21)

Or, maybe a better example:
object Email {
  def apply(user: String, domain: String): String = user + "@" + domain
  def unapply(email: String) =
    val parts = email split "@"
    if (parts.length == 2) Some((parts(0), parts(1)) else None
}

email match {
  case Email(user, domain) => if (domain == our_domain) message(user) else forward(email, domain)
  _ => log_unknown_email(email)
}

val email = Email("joe.schmoe", "example.com")

As you can see, the apply method gives us a constructor whose syntax perfectly matches the case pattern used for decomposition. (Just as we get in my original proposal when your __init__ and __match__ match.) That's nice, but not a huge deal in many cases.

Anyway, this one is easy to do in Python:

@apply
def twice(x):
    return x * 2

case twice(21):
    of twice(n): print(n)
    else: print(-1)

But I don't think this is necessary. It doesn't seem to be used much in Scala code, except in the special case of an extractor whose name matches the class--which we'd handle by having __match__, with the already-existing __init__ for a constructor. (Of course there's nothing forcing you to make your __match__ and __init__ parallel. But then there's nothing forcing apply and unapply to be parallel in Scala either. Usually, you'll want them to go together; in the rare cases where you want them to be different, there's nothing stopping you.)

Plus, we already have ways to write external factory functions for types when you really want to; there's no reason to add and standardize on a new way.

But, if we do want apply (and, again, I don't think we do), is sometimes having two functions instead of one a good argument for putting apply and unapply together in a class? I don't think so. Compare:

@apply
def twice(x):
    return x * 2
@unapply
def twice(x):
    if x%2 == 0: return (x/2,)

@extractor
class twice:
    def apply(x):
        return x * 2
    def unapply(x):
        if x%2 == 0: return (x/2,)

Sure, the latter does group the two together in an indented block, but I think it looks like heavier boilerplate, not lighter. And in the very common case where you don't have apply that will be much more true. Also, notice that @extractor has to be a bit magical, auto-singleton-ing the class and/or converting the methods into @staticmethods, which makes it harder to understand. And, if anything, it's less clear what twice(21) or of twice(n): are going to do this way; sure, once you internalize the extractor protocol you'll be able to figure it out, but that's not trivial. (Plenty of Scala developers don't seem to get how it works in their language; they just treat it as magic.)

Sequence extractors


Scala lets you write sequence extractors, with another special method, unapplySeq. This returns an optional sequence instead of an optional tuple (or an optional tuple whose last element is a sequence), and there's special syntax that allows you to match the decomposed sequence with a * wildcard, like this:

xs match {
  case List(a, b, _*) => a * b
}

It's pretty obvious how to do this in Python, and a lot more simply: this is just the normal use of * for iterable unpacking assignment, *args parameters, and splatting arguments. I already covered how this works with __match__ in my earlier post; it's the same for extractors.

Infix extractors


Scala also allows you to use infix extractors, to match infix patterns. For example, you could match 2 * n directly by using an extractor named * with an appropriate unapply.

More typically, you match head :: tail to deconstruct cons lists, using an extractor that looks like this:

object :: {
  def unapply[A](xs: List[A]): Option[(A, List[A]) {
    if (xs.isEmpty) None else Some((xs.head, xs.tail))
}

If Python doesn't need arbitrary infix operators, or the ability to use existing operators as prefix functions or vice-versa, or other infix-operator-related features like sectioning, it doesn't need infix patterns either.

Especially since the primary use for this, deconstructing cons lists, is something you're rarely going to need in Python. (If you want cons lists, you have to write them yourself, and you won't get any syntactic sugar for constructing them, so why would you be surprised that you don't get syntactic sugar for deconstructing them? You can use the same Cons(head, tail) for a pattern as you use for a constructor.)

Patterns in other places


My proposal suggested explicit case statements, and generalizing sequence unpacking in assignments to pattern decomposition, but I didn't think about all the other places Python binds variables besides assignments. Scala's designers did, and they made it work in as many places as possible.

So, looking at all the places Python does bindings:

  • Iteration variables in for statements and comprehensions: We can already sequence-unpack into the iteration variables; why not pattern-match? So, for Point3D(x, y, z) in vertices:.
  • as-clause variables in various statements. Python already allows sequence unpacking here, although it requires extra parentheses:
    • with: I don't think Scala has an equivalent, but I could definitely imagine wanting it here. Although often you're going to want to bind the whole thing as well as decomposing it, and doubling the as (with spam.open() as TextFile(filename) as f:) seems pretty horrible.
    • except: Scala's catch basically has an implicit ex match, which you write normal case clauses inside. This could actually be more useful in Python than in Scala, where exceptions are full of good stuff, and you could, e.g., pull the filename out of a FileNotFoundError. But it would be clumsy, because you've already matched the type: except FileNotFoundError as FileNotFoundError(filename):. So... maybe in a new language, but probably not useful in Python.
    • import: It seems like it would always be more readable to import and then case. Maybe someone can come up with a counterexample, but I doubt it.
  • For completeness, if we're looking at import ... as, then we should also look at from ... import, and maybe even plain import. And I think it looks just as bad in the former statement, and a thousand times even worse in the latter.
  • Function parameters in def and/or lambda. In my previous post, I suggested ML-style pattern overloads for def, and assumed that the fact that this wouldn't work for lambda wouldn't matter because why would you want it there? But in Scala, single-case lambdas can be written without all the match syntax, and this is used pretty extensively in idiomatic Scala. (Of course in Scala, pattern matching is an expression, not a statement, and idiomatic Scala uses lambdas in many places where idiomatic Python uses named functions.) There's a bit of ambiguity in the fact that a parameter list is actually a list of target-lists that have to be single targets or parenthesized--and remember that Python 3.0 removed that ambiguity by killing even basic iterable unpacking for parameters. Maybe someone could come up with a syntax that allows def line_to(Point3D(x, y, z)): or similar without that ambiguity, but even if you could, does that really look useful in a language without static typing and overloads? (Again, I think if we want this, we want a way to define overloads by pattern.)
  • Conditions are not currently a place to do bindings in Python, and that's a good thing. But occasionally, something like C's if ((c = getch()) != EOF) is handy. Could a stripped-down single-case pattern-matching binding in conditions be a way to get most of the benefits, without the problems? (Similar things certainly work in a variety of languages from Swift to Haskell, but in those languages, an if body has its own scope, which will either shadow an outer-scope name or raise an error complaining about shadowing, so I don't know if that means anything.) Anyway, I haven't thought much about this.

Partial functions


Scala makes extensive use of partial functions. Not partially-applied functions (like functools.partial), but partial functions in the mathematical sense: functions that are only defined on part of the implied domain.

This is actually a weird hybrid of static and dynamic functionality, but it turns out to be useful. At compile time, Scala can tell that your cases don't exhaust all of the possibilities. In many functional languages, this would mean that you get a compiler error. In Scala, it means you get a partial function instead of a normal function. If you call it on an out-of-domain value, you get a MatchError at runtime. But you can also LBYL it by calling its isDefinedAt function. And code that always checks before calling can be statically determined to be safe.

And, idiomatically, partial functions--usually constructed with the special anonymous-function-with-implicit-match syntax mentioned above--are what you often pass to things like collect. The collect function does the equivalent of map and filter at the same time, building a new sequence with the results of the partial function applied to the members of a source sequence, while skipping the ones that aren't defined. Like this, in Python terms:

def collect(func, iterable):
    for value in iterable:
        try:
            yield func(value)
        except MatchError:
            pass

Of course that demonstrates that a dynamic language that loves exceptions and EAFP really doesn't need an explicit concept of partial functions. A partial function is just a function that can raise (or that can raise a specific exception), and using partial functions is something we trivially do all the time without thinking about it. So, this is a neat way to get familiar Pythonic functionality into a very non-Python-like language, but Python has nothing to learn there. (Although maybe mypy does?)

Other languages


ML


First, traditional ML patterns have a feature that we haven't discussed: "either patterns", where two (or more) different patterns (with "|" or "or" between them) are treated as a single case. In Python terms:

case meal:
    of Breakfast(spam, *_) or Lunch(spam, *_):
        print('Are {} servings of spam really enough?'.format(spam))

In ML, the composed patterns have to bind the same set of variables, with the same types (as they are in the example above). There have been attempts to expand that to wrap variables bound only on one branch in optional types, and even to allow variables with different types on each branch to be bound as choice types, but that turns out to be painful to use (you end up with a mess of pattern-matching expressions inside pattern-matching expressions that would be a lot more readable if you just flattened them out).

In object-oriented ML variants, it makes sense to allow the different branches to match different class types, as long as they share a common supertype, in which case the bound variable is of that supertype. (I don't think either base OCaml or F# actually does that, but it makes sense...)

But in a duck-typed language like Python, of course, you could dispense with all requirements: variables matched to different types in different branches are fine, as long as all the types duck-type successfully for whatever operations you use with them, and even variables only matched in one branch and not the other are fine, as long as you can handle the possible UnboundLocalError.

Some ML offshoots have experimented with the dual idea of "both patterns", where two (or more) different patterns (with "&" or "and" between them) are treated as a single case. Here, obviously, they have to either not bind the same variables, or bind them to the same values. (The Python rule would presumably be that they can bind the same variables, but if they bind them to different values, one of them wins arbitrarily. Or maybe it would be useful to say the last one always wins, or the first truthy one, or some other rule? Without experimenting with it, it's hard to guess whether any of them would be particularly helpful...)

With only literally deconstructing patterns, both patterns aren't very useful (what's the point of matching Breakfast(spam, _, _) and Breakfast(_, 0, eggs) instead of just Breakfast(spam, 0, eggs)?), but with unapply functions, they make a lot more sense. For example, if you had any two of the coordinate systems for 3D points, you could match the others--e.g., cartesian(_, _, z) and spherical(r, theta, _) gives you the cylindrical coordinates.

I'm not sure how useful either of these features would be in Python, but they're not too hard to define. If someone wants to write a serious proposal to add pattern matching to Python, they should at least provide a rationale for not including them.

F#


F# builds on OCaml with the specific intent of using .NET libraries, and in particular using the C#-centric, OO-heavy .NET framework for most of its stdlib. So, they obviously needed a way to decompose arbitrary classes. Since Scala, Swift, Rust, etc. hadn't been invented yet, they had to figure it out for themselves. They took it as a specific case of the more general idea of matching abstract data structures, and wrote a paper on their design, Extensible Pattern Matching Via a Lightweight Language Extension.

In simplest form, F#'s "active patterns" are pretty similar to Scala's extractors, and even more similar to the variation I suggested above: instead of defining a singleton object with unapply and apply methods, you just define plain (unapply) function with a special marker (F# uses "banana brackets" around the name, like (|Complex|), rather than a function decorator, which becomes important later) to declare it as an active pattern; you can then use it in any (single, partial-multiple, or total-multiple) match expression the same way you'd use a record type.

But you can go farther. For example, instead of just returning a tuple, you can return what looks like a record constructor with the function name as the record type name and the tuple as its arguments. And you can then define a "banana split" pattern, with multiple names inside the banana brackets, which can return what looks like a record constructor for any of those names, which can then be pattern-matched as if they were the multiple constructors of an actual sum type. (In fact, they are constructors of an actual sum type, it's just anonymous). The example in the docs deconstructs an abstract list with (|Cons|Nil|) by calling its non-empty, head, and tail methods. In Python terms, it might look something like this:

@unapply_split('Cons', 'Nil')
def cons_nil(lst: LinkedList): # LinkedList is an ABC
    if lst: return Cons(lst.head(), lst.tail())
    else: return Nil()

And you can then use Cons and Nil in patterns to match list instances. The (inferred) type of that function in F# is something horrible that you don't want to ever read involving the parametric choice type, so the fact that it would probably not be represented in the type system at all in Python isn't terrible. Anyway, while that's a nice feature, I can't imagine how to make it fit in nicely with Python syntax, and defining two separate partial unapply functions is not that much heavier, and probably a lot easier to understand:

@unapply
def Cons(lst: LinkedList):
    if lst: return lst.head(), lst.tail()

@unapply
def Nil(lst: LinkedList):
    if not lst: return ()

However, it's a lot easier to prove that a pattern matching both Cons and Nil is a total pattern (or detect whether it's a partial pattern, for use in building automatic partial functions with is_defined support) on LinkedList with the "banana split". That's important for F# (or Scala), but probably not so much for Python. (Of course this particular example is pretty stupid for Python in the first place--just use the iterable protocol. But it obviously generalize to less trivial examples that aren't as silly in Python.)

F# also allows for matching with independent active patterns, with the usual rule that the patterns are just dynamically tried in order and there must be a default case, which allows you to use partial matching functions and still prove total matching. Anyway, that feature is exactly equivalent to the separate unapply functions just described, except that in F# you have to explicitly define each active pattern as partial (by banana-splitting it with _), and of course it has to be represented somehow in the type system (by wrapping the single type or choice of types in an option that's normally handled for you by the syntactic sugar of the multiple-match expression), while in Python, you'd take it for granted that patterns without default can't be proven total and that the values are duck typed.

Finally, you can extend this to wrap almost any function as a pattern match. For example, you can write a parse_re active pattern that takes a regex from the pattern, and additional arguments from the match expression, and returns its groups if successful, This is really only tricky with static typing, because inferring the type of that parse_re function is a nightmare. Also, in a dynamic language like Python, it's pretty much unnecessary: you'd either add __match__ to the SRE_Match type, or write an unapply function for it, and then just use re.compile(regex).match(x, y) as your pattern, right?

Beyond all the ways active patterns expand on ML-style patterns, the fact that they're ultimately just functions means you can use them with higher-order functions. Of course the Scala extractor objects or the proposed unapply functions for Python have the same benefit.

Finally, F#, unlike most ML offshoots, has a notion of mutable records. With a bit of clumsy syntax, you can actually bind to a mutable attribute in the match and then mutate that attribute in the resulting code. You'd think that, if this were useful, there would be a way to do the same with abstract types, but (short of building a bunch of proxy objects) there isn't. So, presumably that implies that we don't need binding to "reference cells" or anything like that in Python.

C#


More recently, the designers of C# looked at F# and Scala pattern matching in practice, and designed pattern matching for C#.

It's worth noting that they decided that, even though their pattern matching has to handle arbitrary OO classes, it's still worth adding record types or more general ADTs if they're going to add pattern matching. That seems like more evidence for adding automatic matching for the record-like types that already exist in Python (like namedtuples) and/or adding actual ADTs.

But the most interesting part of C#'s design is that they start out with a single-clause pattern-matching expression, EXPR is PATTERN, and build everything on that. The expression is true if the pattern matches, and any bindings in the pattern are bound in the enclosing scope. (Actually, their rule for deciding the appropriate scope is pretty complicated, and weird in some cases, but the Python equivalent would be easy--the enclosing function or class body is always the appropriate scope.) That's general enough to let you do things like if (expr is List x) do_list_stuff(x); or if (x?.y?.z is Int i) do_int_stuff(i); without needing special syntax like Swift's if let. And then a multi-pattern switch statement or a a multi-pattern match expression--or, if you're the C# team, why not both?--is pretty simple syntactic sugar for that (but syntactic sugar that the optimizer can recognize and maybe take advantage of). Destructuring assignment seems to be the only single-pattern use that couldn't be directly built on is, so they added special syntactic sugar for that.

So, how do they handle deconstructing arbitrary classes? The default behavior only works for destructuring record/ADT types (and for simply matching values to types without any destructuring), but that is operator is overloadable, like any other operator in C#. At first glance, that sounds similar to the __match__ protocol (for normal-method operator overloads) or to the extractor protocol (for static-method overloads), but it's actually more complicated: is is treated like a conversion operator, so you can specify how any class destructures to or from any other class. The example in the docs is a Polar point class that deconstructs a separate Cartesian point class, so you can write var p = Cartesian(3, 4) and then later match it with is Polar(var r, var theta). So, Scala's extractors are really just a special case of C#'s overloaded is. The question is, are there any other cases that are useful? The only examples I see all use a static class with a static is overload and no other methods at all, which means they're just unapply functions with extra boilerplate...

Mathematica


I may have mentioned this in my previous post, but... In Mathematica, everything is a primitive value or a tree. And this means pattern-matching in Mathematica is designed around trees, rather than flat structures. So, you can match something like a[b[_], _], which is like first matching a[X, _] and then matching X against b[_]. You can also match against just the second level, or against just a leaf at whatever level it appears, etc. That's all pretty cool, but I don't think it's needed in any language where everything isn't a tree.

Swift, Rust, etc.


I've already mentioned Swift multiple times. The other obvious modern language to look at is Rust. I don't think there are any radical new ideas from either, but if I'm wrong, I'll come back and edit this section.

Dynamic languages


Pattern matching comes out of statically-typed functional programming. Scala, C#, Swift, Rust, etc. are all primarily attempts to bring ideas from static FP to static OOP (in ways that will hopefully be familiar to traditional OO programmers), and secondarily attempts to find ways to simulate some of the benefits of duck typing on top of static typing, so it's not too surprising that they all came up with similar ways to import the idea of pattern matching.

But, besides asking whether the lack of static typing could be a problem for pattern matching in Python (e.g., the fact that we can't detect partial patterns at compile time), we should also ask whether duck typing could offer more flexibility or other benefits for pattern matching. I already mentioned how either and both patterns could be more useful in Python than in OCaml. And the fact that Python doesn't have to infer horrible choice types that no human will ever be able to debug if the inference goes wrong might be an advantage.

But if there are any good examples in other dynamic languages, it would definitely be worth looking at them.

Of course many dynamic languages have "pattern matching" in Perl terms: regex matching on strings, or some generalization of string-matching ideas to other types. I know that generalization is supposed to be one of the key ideas in Perl 6, but, having not dug into Perl 6 beyond reading the apocalypses, I don't know if it meets up with any interesting common ground with ML-style pattern matching, or if it's just SNOBOL string patterns with a more powerful grammar.

Also, of course, people have implemented pattern matching in Perl, Ruby, etc., just as they have in Python, and those languages with more flexible syntax allow them to make it look more like ML--but that just makes it look less like the native language, doesn't help at all in designing a way to integrate it as a real native language feature, and, most importantly, generally does nothing to solve the problem of decomposing objects, which is the key problem.

There's gotta be something on LtU...


I've only done a cursory search at Lambda the Ultimate to see if anyone's published any papers on integrating pattern matching into dynamic OO languages. I found and skimmed some posts about the problem of decomposing objects, but all in terms of static OO (that's where I found the F# paper, which led me to the C# work in progress...). Some of those might have interesting ideas that aren't incorporated in any of the languages I've looked at, but I don't yet know. And it seems like there must be something on dynamic object decomposition out there, even if I didn't find it in a couple minutes.

How useful is pattern matching in idiomatic Scala?


Scala, like many modern functional languages (and, increasingly, even some primarily-imperative languages), builds its failure handling around an Option type, basically equivalent to Haskell's Maybe but with some extra stuff piled on top (like being able to construct an Option from a nullable-typed Java object).

In most such functional languages, pattern-matching is the idiomatic way to deal with optional values. But in Scala, many guides (like part 5 of Westheide's) point out that it's pretty verbose in Scala, and it's often better to use the extras piled on top of the type instead. Compare:

println("Name: " + user.name match {
  case Some(name) => name
  case None => ""
})

println("Name: " + user.name.getOrElse("<unknown>"))

Honestly, I'm not qualified to say whether Westheide is right here (although other guides agree with him--and the fact that Scala has the getOrElse method is telling), or whether this extends to other paradigm examples of pattern matching or is specific to the fact that Option, and what you usually do with its values, is so simple. But still, it's worrying: if idiomatic Scala usually doesn't use pattern matching even for one of the paradigm examples of pattern matching, how often would idiomatic Python use it?

Comparing to Swift


Swift made extensive use of Optional from the first beta, but the designers originally didn't feel the need to add general pattern matching. Instead, the general solution is to test for nil, and then unwrap inside the tested code. There's special support for that all over the place:

  • There's a single-character type operator for optional types: int? is the same as Optional<int>.
  • Unwrapping is a single-character postfix operator: var!.
  • The language's truthiness notion is based on non-nil (except for actual boolean values), so you can write if var { println(var!) }. (Yes, in theory, this means optional booleans are a bit harder to deal with, but that rarely if ever comes up--practicality beats purity and all that.)
  • if let lets you combine the test and binding into one: if let v: int = var { println(v) } binds the value from the optional int var to the non-optional int v, so you can use it in the "then" scope.
  • guard is like if, but lets you bind a variable in the enclosing scope (by enforcing that the failing case must exit that scope), so you can write guard let v: int = var else { return } and then use v in the rest of the function.
  • The nil-coalescing operator ?? serves most uses of an "unwrap-or-else-use-default" method: 2 + (v ?? 0).
  • The typing engine, including inference as well as checking, understands all of this, so you can just write if let v = var without specifying a type, the compiler can skip any warning or runtime type check inside the body of an if or after a guard, etc.

However, in later versions of Swift, they felt the need to add additional sugar to pattern matching for optional values anyway. In general, you match an Enum (ADT) constructor (in switch, in a single-pattern if, and a few other places) with case .Kind(val1, val2) = var, and any bindings you want have to be explicitly signaled, with case .Kind(val1, let var2). But when you're matching an optional value, instead of if case .Some(let var1) = var, you can just write case let var1? = var2. So, maybe this implies that pattern matching is important, even when you've done everything you can to special-case away the need for it. Usually, idiomatic Swift code gets away with avoiding any pattern matching for optionals by using if let and ??, but the fact that the designers felt the need to add case let var? implies that they couldn't get rid of every need for it.

Naming


From a quick survey of other languages:

ML uses case ... of to introduce the multi-case pattern-matching construct, and | to separate cases; a function definition can also be defined in terms of cases on its parameters, without needing the case bit. Some ML descendants change the | to a different symbol, or change it to the keyword of (and remove that from the outer clause). But newer descendants mostly strip it down even further. Erlang makes every binding a single-case pattern match, so you don't need any syntax. Haskell embeds the multi-case match into various bits of syntax, like binding, and replaces the | with whitespace, so there's basically no syntax anywhere.

Meanwhile, to people coming from the imperative world, C's switch and case are familiar. Whether that's a good thing, or an attractive nuisance, seems to be up for debate. Swift deliberately uses both of the same words. Many other languages (Scala, Rust, draft ES7) use match instead of switch, but some of them still use case for the individual cases. Others avoid both C keywords.

One advantage out of using the word case this way is that, unlike of or a symbol or whitespace, it reads well in single-case stripped-down constructs, like Scala's single-case simple anon functions ({ case(a, b) => a+b }) or Swift's if case and similar statements (if case .Point(let x, let y) = point where x == y { print("diagonal at \{x}") }).

Anyway, my use of case and of probably isn't actually familiar to anyone who isn't currently in a late-80s university class, and it collides with C just enough to be misleading but not enough to be comforting. So, I think either match/case or switch/case would fit Python better. (And, if C# plans to set the precedent that switch means statement and match means expression--coinciding with Swift's switch statement and Scala's match expression--maybe it's not worth fighting that.)

Use cases


The main thing stopping anyone (or at least me) from writing a serious proposal for adding pattern matching to Python is the lack of good use cases. Everyone who uses any language built around pattern matching knows that pattern matching is useful... but coming up with actual examples that make sense in Python is harder than it should be. And maybe that means something.

Part of the problem is that most of the examples that aren't just toys tend to be things that can be better done in other ways in Python:

  • Option, Error, etc. types are an alternative to exceptions, and Python uses exceptions pervasively instead. (Plus, as Scala implies, pattern matching may not even be the best way to handle these most of the time.)
  • Decomposing records in general tends to fight against OO encapsulation. While abstract matching solves that problem, most examples, even in languages like Scala and F# that have abstract matching, seem to be borrowed from languages that don't (and written around case classes, records, or whatever they call them). While such cases might arise in Python, the fact that we don't generally use namedtuples all over the place (and they're buried in collections rather than builtins) and we don't have more general ADT-like features implies otherwise.
  • Simple switching functions like factorial (whether with a match expression, or as pattern-matched overloads) only really look simpler with pattern matching when you define them recursively. But in Python, you generally wouldn't define them recursively.

  • Recursively decomposing (cons) lists (and generalizing that to, e.g., abstract list-like objects, as in the F# example above) has the same problem, and the further problem that you rarely use cons lists in Python, and the further problem that, even if you did, you'd almost certainly want to use the iterator protocol on them.

  • Recursively decomposing more complex recursive structures that don't fit the iterator protocol seems more promising, but again, you'd usually want to process them iteratively rather than recursively if they're large enough to be a problem. Plus, you don't often run into deeply recursive data structures anyway--or, if you do, they tend to be implicit in JSON or similar, and pattern matching is more verbose there, not less; the benefit of using patterns there is static type checking, which you aren't going to get in Python.

  • Many examples would be more idiomatically written as method dispatch in an OO language. Even the more complex related problems (like operator double dispatch) already have better idiomatic solutions in Python.

  • The equivalent of a C switch statement is not very exciting, even when extended to matching enum constants or strings instead of just integers, and that's already readable enough as an elif chain. Or, in many cases, you'd just use a dict.

There must be examples in my code or other code in other languages that would still look like idiomatic Python-plus-patterns when translated... but I haven't found them yet.

Conclusions (or lack thereof)


A year and a half ago, I concluded that, even though I think I came up with a reasonable and implementable proposal for pattern matching in Python, I don't think it's a feature Python needs, or should get, at least at present.

The fact that Scala's designers came up with a similar solution to what I proposed, and it's very heavily used in idiomatic Scala, even by people who are mostly coming to the language from Java rather than ML or Haskell or something, is pretty heartening.

There are definitely ways that Scala's feature is more flexible than the one I proposed--and, for the most part, we can pretty easily add that flexibility to my proposal, making it even better.

On the other hand, the fact that Scala seems to encourage alternatives to pattern matching in some of the places I'd use it in Haskell or ML makes me think it might turn out to be underused in Python as well. Or maybe not--after all, maybe you sometimes have to fall back to pattern matching (or get much more verbose) even though the alternatives are there, as is presumably the case in Swift?

So... now I'm less against the idea of adding pattern matching to Python. But I'm still not sufficiently excited about it to build an implementation, write a PEP, try and convince the core developers, etc. And if switching between Haskell and Python didn't make me ache for pattern matching in Python, I doubt that switching between Scala and Python will either. But maybe I'm wrong; time will tell.
0

Add a comment

Blog Archive
About Me
About Me
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.