1. The documentation does a great job explaining how things normally get looked up, and how you can hook them.

    But to understand how the hooking works, you need to go under the covers to see how that normal lookup actually happens.

    When I say "Python" below, I'm mostly talking about CPython 3.5. Other implementations vary to differing degrees in how they implement things; they just have to result in the same rules being followed.

    Variables: LEGB


    Python has simple lexical scoping, like most other languages. (Unlike many languages, only module, function, and class definitions create new scopes, not every suite, but that's not too important here. Also, remember that lambda expressions and comprehensions are function definitions, and thus create new scopes, despite not having suites.)

    The specific rule Python uses for lookup is LEGB: Local, Enclosing, Global, Builtin. In other words, when you write x, Python looks for a local variable named x then (if you were in a nested function or class) it goes through all the enclosing scopes' local variables, then it looks at the module's globals, and then it looks at builtins.

    Meanwhile, because Python has implicit declarations (any assignment may be declaring a new variable), it needs a rule for that: any assignment creates a local variable (shadowing any enclosing, global, or builtin of the same name), unless you have a nonlocal or global declaration in the same scope.

    The way Python implements these rules is not as simple as you'd think. Primarily, this is for reasons of optimization.

    tl;dr


    The short version is:

    * Locals (except when shared with a closure) are loaded and stored in an array on the stack frame.
    * Nonlocals (and locals shared with a closure) are basically the same but with an extra dereference.
    * Globals are loaded and stored in a dict.
    * Builtins are loaded via two or three dict lookups (and are never directly stored).

    Fast locals


    Local variables are stored in an array on the stack frame, and loads and stores are compiled into array-indexing operations on that array. This is basically the same behavior as C--and it's a lot faster than going through a dictionary.

    Exceptions to this rule:

    * Modules don't have fast locals; they use the globals dict as their locals, so everything works the same as for the globals section below.
    * Local variables that are shared with a nested closure are slightly more complicated; see the next section.

    So, how does Python know how to compile your locals to array indices? And how does it turn that back into the local variable names that you can see in tracebacks, locals(), inspect and dis functions, etc.?

    When compiling the body of your function, Python keeps track of all of your variable assignments. (Assignments to attributions or subscriptions don't count; those are just calls on the attributed or subscripted object.) Function parameters are local, any variable that you assign to is local (unless you have a global or nonlocal statement for it); anything else is not.

    So, it creates a tuple of all those local names, and stores it in the co_varnames of the code object. It also stores the total count in co_nlocals. Then, when compiling your code to bytecode, every load or save to co_varnames[3] becomes a LOAD_FAST 3 or STORE_FAST 3.

    When a function is called, the stack frame reserves co_nlocals extra space at the end, called f_localsplus, and when the interpreter sees a LOAD_FAST 3, it just reads from frame.f_localplus[3].

    For example, in this code:

    def spam(eggs):
        cheese = 0
        return eggs + cheese
    

    ... eggs is local because it's a parameter, and cheese is local because you assign to it, so that last line will get compiled to, in effect, LOAD_FAST(0) + LOAD_FAST(1), not LOAD_NAME('eggs') + LOAD_NAME('cheese').

    Now you can probably understand why these optimizations work:

    def spam(eggs=eggs):
        local_cheese=cheese
        for _ in range(1000000000):
            eggs(local_cheese()))
    

    You've replaced a billion dictionary lookups for the globals eggs and cheese with one dictionary lookup for each plus a billion array lookups for f_localsplus[0] and f_localsplus[1], which is obviously faster. (Dictionary lookups are, of course, constant-time, just like array lookups. But with a larger constant multiplier--large enough to make a significant difference here.)

    But how does Python get the names back out? Well, as mentioned, they're stored in co_varnames. And when a function is called, the stack frame gets a pointer to the function's code object in f_code So, if it needs to build a traceback or disassembly or whatever for fast #1, it just gives you frame.f_code.co_varnames[1].

    What about locals()? Well, there actually is a locals dict on the stack frame, called f_locals. But it doesn't get created unless you ask for it (e.g., by calling locals(), or just by asking for the Python version of the frame object with, e.g., sys._getframe()). This calls a function PyFrame_FastToLocals, which effectively does frame.f_locals = dict(zip(frame.f_code.co_varnames, frame.f_localsplus)), and then it returns that f_locals to you. It should be obvious why, as the docs say, modifying locals() doesn't affect the function's actual locals: it's just a snapshot of your locals. (There is actually a function PyFrame_LocalsToFast, but you can only call that from C, not Python.)

    What about exec()? Again, as the docs say, this function by default works on (something equivalent to) it's caller's globals() and locals(), so, again, it can't change your local variables.

    I used Python-like pseudocode for all the stuff that happens in C under the covers. But most of that actually is exposed to Python. For example, if you call sys._getframe(), the frame object you get back has an f_code member. Or, if you just look at a function, it has a __code__ member. And either way, that object has a co_varnames. And so on. The f_localsplus member, and the PyFrame_FastToLocals and PyFrame_LocalsToFast are the only things mentioned here that aren't exposed. And even there, you can always do this:

    ctypes.pythonapi.PyFrame_LocalsToFast.argtypes = [ctypes.py_object, ctypes.c_int]
    ctypes.pythonapi.PyFrame_LocalsToFast.restype = None
    ctypes.pythonapi.PyFrame_LocalsToFast(sys._getframe(), 0)
    

    Have fun with that.

    Free and cell variables


    Variables from an enclosing scope are almost the same as locals--they're handled by looking them up in an array--but with an extra deference. But that requires a bit of extra work to set up.

    So, how does Python know how to compile your nonlocals to array indices?

    As we saw above, when Python sees a variable, it knows whether that variable is local to your function. If it isn't, Python steps through the outer enclosing functions to see if it's local to any of them. If so, it's a free variable. (Of course a global statement means it's a global variable, so it doesn't look through enclosing scopes, and a nonlocal means it's a free variable even if there's a local assignment.)

    Notice that the module's global scope is not counted as a scope for these purposes--anything found in the global scope is always global, not enclosing.

    So anyway, the compiler creates a tuple of all those free names, and stores it in co_freevars. Then, when compiling your code to bytecode, every load or save to co_freevars[2] becomes a LOAD_DEREF 2 or STORE_DEREF 2.

    Now, when a function is called, the stack frame doesn't just reserve co_nlocals space, but co_nlocals + len(co_freevars) extra space at the end (I'm still lying to you here; see a few paragraphs down for the truth) in f_localsplus, and when the interpreter sees a LOAD_DEREF 2, it just reads from frame.f_localsplus[frame.f_code.co_nlocals + 2].

    Well, not quite. That obviously wouldn't allow you to reassign closure variables. (You could call mutating methods, but assignment would just break the connection, just like after a = b = [1, 2, 3], a.append(4) affects b, but a = [4] breaks the connection.) And that would defeat the entire purpose of nonlocal.

    So, what you actually have in that slot isn't the object itself, but a cell object that has a member named cell_contents that's the actual object. And what the outer scope has in its fast locals slot is the same cell object, not the thing it points to. So, if either the inner function or the outer function assigns to the variable, they're not assigning to one of their fast local slots, but assigning to the cell_contents of a cell in one of their fast local slots, so they can each see what the other one does.

    That's why local variables that are referenced by a nested scope work like nonlocal variables. They're stored in co_cellvars instead of co_freevars. Your frame's actual space is co_nlocals + len(co_cellvars) + len(co_freevars). The name that goes with a LOAD_DEREF 2 is equivalent to (co_cellvars+co_freevars)[2]. At runtime, that LOAD_DEREF 2 actually does a f_localsplus[co_nlocals+2].cell_contents. When you construct a nested function at runtime, there's zero or more LOAD_CLOSURE bytecodes to push the current frame's cells onto the stack, then a MAKE_CLOSURE instead of MAKE_FUNCTION, which does the extra step of popping those cells off and stashing them in the function's __closure__ attribute. And when you call a function, its __closure__ cells get copied into the f_localsplus array. Meanwhile, even if your function doesn't access any variables from an enclosing scope, if a function nested within your function does so, they're free variables for you, and have to get put in your __closure__ so they can end up in your stack frame so they can get to your nested function's __closure__ and stack frame.

    And I believe that's all of the actual details correct.

    But you don't have to memorize all this stuff, or go dig it out of the C code, or work it out by trial and error, or plow your way through that horrible paragraph ever again; the docs for the dis module describe what each opcode does, and have improved tremendously over the past few versions of Python. If you can't remember exactly what LOAD_DEREF looks up and where, just look at the docs: "Loads the cell contained in slot i of the cell and free variable storage. Pushes a reference to the object the cell contains on the stack." That might not have explained everything to you if you'd never read anything before, but it should be more than enough to refresh your memory once you have.

    Also, again, most of this is exposed to Python, so the Python-esque pseudocode above is almost all real code. However, from Python, you can't write to a cell through its cell_contents, and you can't construct new cell instances. (If you want to have fun with ctypes.pythonapi, you can probably figure it out from here.)

    Finally, locals(), in addition to giving you a snapshot of your locals rather than a live way to change them, also flattens out any cells, giving you their cell_contents instead.

    At first glance, it seems like it would be simpler to just have each function's stack frame have a pointer to its outer function's stack frame. You could still optimize things just as much by having instructions like LOAD_OUTER 3, 2 to go 3 steps out and get the fast-local 2 from that frame, right? In fact, some languages do implement closures that way. But that means that the a function's stack frame can't go away--and, therefore, neither can anything else it refers to--until all closures referring to any of its variables, or any of its outer scopes, goes away. And that "anything else it refers to" includes the stack frame of its caller. You can see why this could be a problem. There are ways to optimize that (although you still run into problems with coroutines), but it's actually simpler, not more complicated, to just reference the individual values pulled off the frame, as Python does.

    Globals and builtins


    Globals are easy. If the compiler sees a name that isn't assigned in the current function or an outer nested function, or that has a global statement, it's global. There's no special operation for builtins; it's just a fallback for every global lookup.

    The compiler does still use an array here, but only to fetch the name from a constant tuple on the code object (co_names) to look up with LOAD_GLOBAL and STORE_GLOBAL at runtime. They look something like this:

    def LOAD_GLOBAL(index):
        name = frame.f_code.co_names[index]
        try:
            return frame.f_globals[name]
        except KeyError as e:
            try:
                builtins = frame.f_globals['__builtins__']
                if isinstance(builtins, ModuleType): builtins = builtins.__dict__
                return builtins[name]
            except KeyError as f:
                raise e
    
    def STORE_GLOBAL(index, value):
        name = frame.f_code.co_names[index]
        frame.f_globals[name] = value
    

    That f_globals will normally be the actual globals for the module the current (module-level, function, or class) code was defined in, but it could be something else. (For example, exec lets you pass in any mapping for globals; if you want a sanitized and quasi-sandboxed environment with your own under-the-hood functions and builtins like exec and __import__ stripped out, that's easy.)

    Much simpler than closure variables, right? But it can be slower. So, why have a special case for globals instead of just making it the last stop on the closure chain (or making builtins the last stop instead of a special case within the globals special case)?

    Well, for one thing, not having fast locals or cells and doing everything through the globals dict allows for more dynamic stuff, like creating new variables in exec and using them in the same scope. For another, not having to make a list of all of the names at scope, simply adding them to the dict as they're created, means that the global scope can be executed incrementally, statement by statement (so it works the same as in the REPL--in fact, both the REPL and the top-level script work by just incrementally building a module named __main__). In particular, you can define a function that uses a global, and start passing it around, before that global has been created.

    More importantly, lookup speed shouldn't really matter at global level. If you're doing any heavy computation that involves millions of name lookups at module scope, move it into a function and call that function. And if you're using those globals millions of times from another function, get them out of global scope too. Those are "structured programming 101" anyway; why try to optimize things that people shouldn't be doing?

    Well, except for the fact that your module's top-level functions and classes are globals, and it's perfectly reasonable to use them millions of times. In less dynamic languages, they're only looked up as globals at compile time, so they aren't like runtime global variables--but in Python, functions, types, and everything else are first-class values, and they're looked up the same way as any other global variables. In practice, this usually isn't a bottleneck for most programs; when Python is too dynamic and slow for the inner code of some project, there are usually other factors that matter much more than this dict lookup. But occasionally, it is. So, this is one of those places people keep looking for new optimizations. At the user level, you can always just copy the global to a local or a default parameter value, as mentioned earlier. Or you can use an ambitious optimizer that tries to automate the same benefit (like FAT Python, which creates an extra version of each function's __code__ with globals copied into the co_consts, and wrapping the function in a guard that checks that the cached globals haven't changed since the function was compiled and, if so, calling the original "slow" function instead of the fast version).

    What about classes?


    A class definition is a scope, just like a function definition. There are a couple of minor differences (the special LOAD_CLASSDEREF opcode, the magic __class__ cell that's created in class scope so methods can access it directly or via super, etc.). But the really big difference is that the class code isn't executed every time the class is called; it's executed once to produce the dict that gets passed to type (or another metaclass), and then it, and its scope, goes away. (Of course the body of its __new__ and/or __init__ will get executed when the class is called.)

    I don't think there's anything important to learn about classes nested in functions or classes (or classes nested in functions, unless you want to know how super works) that isn't obvious once you play with it.

    Attributes


    Attribute lookup seems really simple at first: Every attribute lookup is compiled to LOAD_ATTR, which uses a string taken from the same co_names array as LOAD_GLOBAL, and looks it up in the object's __dict__. Or, if it's not found there, in the object's type's dict. Or, if not there, then each type on that type's MRO. If that all fails, __getattr__ is looked up (but using special-method lookup, not normal attribute lookup) and called. Oh, unless __getattribute__ was overridden, in which case whatever it does happens instead. Also, somehow functions get magically turned into bound methods. And @property, and __slots__ and... and how does it find the __dict__ in the first place? And what about C-API types? Maybe it's not so simple after all...

    It's actually pretty simple: the only rule in the interpreter is special-method lookup, and LOAD_ATTR only ever uses that rule to look up and then call __getattribute__. (Other special methods are handled internally by their opcodes or other special builtin or extension functions written in C.) Everything else happens inside __getattribute__; we'll get to that later.

    Special method lookup


    Special method lookup is only somewhat vaguely documented, in the data model reference chapter. See call_method and its helpers in the source for details.

    The basic idea of special method lookup is that it ignores the instance dictionary, and most kinds of dynamic customization.

    You may wonder what the point of special method lookup is. Obviously __getattribute__ lookup can't be hooked by __getattribute__, but why should it also skip the instance, or treat metaclasses differently? And why should other special methods like __add__ follow the same rules? (If you wanted to add per-instance methods to a value, __add__ or __lt__ would probably be the first things you'd think of.)

    In fact, the original "new-style class" design for Python 2.3 didn't have special method lookup--but (long before the first beta), Guido discovered a problem: Some methods make sense both on normal objects, and on types. For example, hash(int) makes just as much sense as hash(3). By normal lookup rules, both call int.__hash__. By special method lookup rules the former skips int's dict and instead calls type.__hash__. And a few other methods have the same problem, like str and repr. And what would happen if a type had both __call__ and __new__? And, even those that don't make sense for types--int + str, abs(Counter), or -socket--might make sense for some subclass of type (e.g., the new optional static types use subscripting, like Iterable[int], for generics).

    So, that's why we have special method lookup, and it obviously solves the __getattribute__ problem.

    How do we find the __dict__ without finding the __dict__ first?


    There's still a circular definition problem here. How do we find an attribute? By calling the __getattribute__ attribute on the object's type and walking its MRO looking in the __dict__ of each type. So... how do we get that __dict__? Certainly not by searching __dict__s. And the process also involves a variety of other attribute lookups--you get an object's type via its __class__, a type's MRO via its __mro__, and so on.

    The secret is that none of these are actual attribute lookups. In the C struct representing each object, there's an ob_type pointing to the type. In the C struct representing each type object, there's a tp_mro pointing to the MRO, and a tp_getattro pointing to the __getattribute__, and a tp_dict pointing to the __dict__, and so on.

    But you can set __class__ or __dict__ or __getattribute__ from Python. How does that work? In general, when you overwrite one of these special values, the interpreter updates the corresponding C struct slot. (There are a few cases where it does different things, like using a slot as a cache and searching the dict if it's NULL, but that works too.)

    So, in the pseudocode below, when I write type(obj).__mro__, that's really following the ob_type slot from the object, then the tp_mro slot from the type, and so on.

    Finding __getattribute__


    The basic process to lookup and call __getattribute__ is, in pseudocode:

    def find_special(obj, name):
        for t in type(obj).__mro__:
            if name in t.__dict__:
                it = t.__dict__[name]
                __get__ = find_special(it, '__get__')
                if __get__:
                    it = __get__(obj, type(obj))
                return it
    
    def LOAD_ATTR(obj, index):
        name = f_code.co_names[index]
        __getattribute__ = find_special(obj, '__getattribute__')
        if not __getattribute__:
            raise AttributeError
        try:
            return __getattribute__(name)
        except AttributeError:
            __getattr__ = find_special(obj, '__getattr__')
            if not __getattr__:
                raise
            return __getattr__(name)
    

    If you don't understand the __get__, read the Descriptor HowTo Guide.

    (It shouldn't be possible to ever fail to find a __getattribute__, but there's code for it anyway--if you play around with the C API, you can try to call a method on a class before setting up its MRO, and you get an AttributeError rather than a segfault or something. On the other hand, __getattr__ can easily be missing, because object doesn't implement that, nor do most other builtin/stdlib types.)

    I've skipped over a few things (like the code to handle types that aren't fully constructed yet without crashing, etc.), but that's the basic idea. And it's also how other special-method lookups, like the nb_add (__add__) lookup inside BINARY_ADD, work.

    Notice that a __getattribute__ defined in the instance, or dynamically (by __getattribute__?), or a __getattribute__, __mro__, __class__, or __dict__ pushed onto the type by its metatype, all get ignored here.

    But if you've overridden __getattribute__ on your object's type, or any of its supertypes, or done the equivalent in C, your code will get called (as a bound method--that's the point of the descriptor get bit).

    One thing I skipped that may be worth noting is the method cache. See the macros at the top of typeobject.c, but the basic idea is that 4096 methods (that aren't special-cased away or given unreasonable names) get cached, so they can skip all the MRO-walking and dict lookups and jump right to the descriptor call.

    Inside the default __getattribute__


    Assuming you got to object.__getattribute__, what it does is pretty similar to special-method lookup all over again, but with some minor differences. The short version is that it allows instances, metaclass __getattribute__, and __getattr__ to get involved. In full detail, it's something like:

    • There's no type slot equivalent to "general name known only by a string passed in at runtime", so the per-type step does the tp_dict[name] bit.
    • Four-stage lookup instead of just the per-type MRO walk:
      1. Look on the type and its MRO for a data descriptor (that is, a descriptor with __set__). If so, return its __get__(obj, type(obj)).
      2. Look in the object itself only: if tp_dict (__dict__) and tp_dict[name] exist, you're done--do not call its __get__ method, just return it.
      3. Look on the type and its MRO for a non-data descriptor or non-descriptor. Return its __get__(obj, type(obj)) (or, if it's not a descriptor, just return it).
      4. Special-method lookup __getattr__, call that, and do the __get__(obj, type(obj)) on the result if present.

    So, in pseudocode:

    def __getattribute__(self, name):
        for t in type(obj).__mro__:
            try:
                desc = t.__dict__[name]
                if '__get__' in desc.__dict__ and '__set__' in desc.__dict__:
                    return desc.__dict__['__get__'](obj, type(obj))
                break
            except AttributeError:
                pass
        if name in self.__dict__:
            return self.__dict__[name]
        try:
            if '__get__' in desc.__dict__:
                return desc.__dict__['__get__'](obj, type(obj))
            return desc
        except UnboundLocalError:
            __getattr__ = special_method(self, '__getattr__')
            if __getattr__: return __getattr__(name)
            raise AttributeError
    

    I believe the point of that multi-stage lookup is to make it easy to shadow methods (which are non-data descriptors) but hard to accidentally shadow properties and similar things (which are data descriptors).

    Notice that if the descriptor __get__ raises AttributeError, that will trigger the fallback to __getattr__ from the previous process.

    What about classes?


    If you look something up on a class, the class is the object, and the metaclass is the type. And if your metaclass doesn't override __getattribute__, type does (and it never raises or supers).

    The only difference between type.__getattribute__ and object.__getattribute__ is at the descriptor steps, it calls __get__(None, cls). So, everything works the same, except for the descriptor get bit. (If you work things through, you should see how this, e.g., lets @classmethods work the same whether called on an instance or on the class, and how it lets metaclass methods act like classmethods.)

    Other tricks


    What else is left?

    • __slots__ works by creating a hidden array, and a set of descriptors that read and write to that array, and preventing a tp_dict from being created on the instance.
    • Methods don't do anything special--spam.eggs() is the same as tmp = spam.eggs; tmp() (unlike many other OO languages). They work because they're stored as plain-old functions in the class dict, and plain-old functions are descriptors that build bound methods. See How methods work if that isn't clear.
    • Everything else--@property, @staticmethod, etc.--is just a descriptor; object.__getattribute__ (or type.__getattribute__) finds it on the class, calls its __get__ method, and gives you the result. (It may help to work through how @classmethod produces a bound class method whether looked up on the type Spam.eggs or on the instance spam.eggs.)
    • There are various C API mechanisms to expose struct members, get/set function pairs, etc. as if they were Python attributes. But it's pretty obvious how these work unless you want the gory details, in which case they're in the C API docs.

    What about super?


    It uses four different tricks to be that super, thanks for asking:
    • In every class definition, __class__ is a local referring to the class currently being defined.
    • The compiler rewrites super() to super(__class__, co_varnames[0]). (The first local is the first parameter, so this handles self in normal methods, cls in classmethods and __new__ and normal metaclass methods, mcls in metaclass __new__ and metaclass classmethods, even if you decide to call them something confusing like s or this.)
    • Each method defined in a class gets __class__ as a free variable whenever it refers to super, not just whenever it directly refers to __class__.
    • super.__getattribute__ overrides the normal behavior from object to jump ahead on the MRO chain after the current __class__.
    7

    View comments

  2. In Python (I'm mostly talking about CPython here, but other implementations do similar things), when you write the following:
        def spam(x):
            return x+1
        spam(3)
    
    What happens?

    Really, it's not that complicated, but there's no documentation anywhere that puts it all together. Anyone who's tried hacking on the eval loop, understands it, but explaining it someone else is very difficult. In fact, the original version of this was some notes to myself, which I tried to show to someone who I'm pretty sure is at least as smart as me, and he ended up completely lost. So I reorganized it so that, at least hopefully, you can start each section and then skip the rest of the section when you get over your head (and maybe skip the last few sections entirely) and still learn something useful. If I've failed, please let me know.

    Compilation

    To make things simpler to explore, let's put all of that inside an outer function for the moment. This does change a few things (notably, spam becomes a local within that outer function rather than a global), but it means the code that creates and calls the function sticks around for us to look at. (Top-level module code is executed and discarded at module import time; anything you type at the REPL is compiled, executed, and discarded as soon as the statement is finished; scripts work similar to the REPL.)

    So, there are two top-level statements here.

    The def statement creates a function object out of a name and a body and then stores the result in a local variable corresponding to that name. In pseudocode:
        store_local("spam", make_function("spam", spam_body))
    The call expression statement takes a function object and a list of arguments (and an empty dict of keyword arguments) and calls the function (and then throws away the result). In pseudocode:
        call_function(load_local("spam"), [3], {})
    So, the only tricky bit here is, what is that spam_body? Well, the compiler has to work recursively: it sees a def statement, and it knows that's going to compile into a make_function that takes some kind of body-code object, so it compiles the body suite of the def into that object, then stashes the result as a constant. Just like the number 3 and the string "spam" are constants in the outer function, so is the code object spam_body.

    So, what does the body do? It's pretty simple:
        return(add(load_local("x"), 1))
    Notice that parameters like x are just local variables, the same as any you define inside the function; the only difference is that they get an initial value from the caller's arguments, as we'll see later.

    Obviously real Python bytecode is a bit different from this pseudocode, but we'll get back to that at the very end. There are a few easier questions to get through first, starting with: what kind of thing is a code object?

    Code objects

    You need more than just compiled bytecode to store and execute a function body, because the bytecode references things like constants, which have to go somewhere, and because that call_function is going to need some information about its parameters to know how to map the first argument 3 to the local variable x.

    The object with all this information is a code object. The inspect module docs give some detail of what's in a code object. But some of the key members are:
    • co_consts, a tuple of constant values. In spam this just has the number 1. In the top-level function, it has the number 3, the string "spam", and the code object for the spam body.
    • co_argcount and related values that specify the calling convention. For spam, there's an argcount of 1, meaning it's expecting 1 positional parameter, and also meaning its first 1 local variables in co_varnames are parameters. (This is why you can call it with spam(x=3) instead of spam(3).) The full details of how arguments get mapped to parameters is pretty complicated (and I think I've written a whole blog post about it).
    • co_code, which holds the actual compiled bytecode, which we'll get to later.
    There's also a bunch of stuff there that's only needed for tracebacks, reflection, and debugging, like the filename and line number the source code came from.

    So, the compiler, after recursively compiling the inner function body into bytecode, then builds the code object around it. You can even do this yourself in Python, although it's a bit of a pain—type help(types.CodeType) and you'll see the 15-parameter constructor. (Some of those parameters are related to closures, which I'll get to later.)

    Function objects

    The make_function pseudocode above just takes a code object and a name and builds a function object out of it.

    Why do we need a separate function object? Why can't we just execute code objects? Well, we can (via exec). But function objects add a few things.

    First, function objects store an environment along with the code, which is how you get closures. If you have 16 checkboxes, and an on_check function that captures the checkbox as a nonlocal variable, you have 16 function objects, but there's no need for 16 separate code objects.

    In the next few sections, we'll ignore closures entirely, and come back to them later, because they make things more complicated (but more interesting).

    Function objects also store default values, which get used to fill in parameters with missing arguments at call time. The fact that these are created at def time is useful in multiple ways (although it also leads to the common unexpected-mutable-default bug).

    If you look at the inspect module, you'll see that the key attributes of a function are just the code object in __code__, the default values in __defaults__ and __kwdefaults__, and the closure environment in __closure__ (for nonlocals, which I'll explain later) and __globals__ (for globals—these aren't captured individually; instead, an entire globals environment is). And, of course, a bunch of stuff to aid tracebacks, debugging, reflection, and static type-checking.

    You can do the same thing as that make_function pseudocode instruction from Python—try help(types.FunctionType) to see the parameters to the constructor. And now you know enough to do some simple hackery, like turning spam from a function that adds 1 to a function that adds 2:
        c = spam.__code__
        consts = tuple(2 if const==1 else const 
                       for const in c.co_consts)
        nc = types.CodeType(
            c.co_argcount, c.co_nlocals, c.co_stacksize, 
            c.co_flags, c.co_code, consts, c.co_names, 
            c.co_varnames, c.co_filename, c.co_name, 
            c.co_firstlineno, c.co_lnotab, c.co_freevars, 
            c.co_cellvars)
        spam = types.FunctionType(
            nc, spam.__name__, spam.__defaults__, spam.__closure__)
    
    There are a few limits, but most things you can imagine are doable, and work the way you'd expect. Of course if you want to get fancy, you should consider using a library like byteplay instead of doing it manually.

    Fast Locals

    In reality, Python doesn't do load_local("x"), looking up x in a dict and returning the value, except in special cases. (It does do that for globals, however.)

    At compile time, the compiler makes a list of all the locals (just x here) and stores their names in the code object's co_varnames, and then turns every load_local("x") into local_fast(0). Conceptually, this means to do a load_local(co_varnames[0]), but of course that would be slower, not faster. So what actually happens is that the local variables are stored in an array that gets created at the start of the function call, and load_fast(0) just reads from slot #0 of that array.

    Looking things up by name (a short string whose hash value is almost always cached) in a hash table is pretty fast, but looking things up by index in an array is even faster. This why the def foo(*, min=min) microoptimization works—turning load_global("min") into load_local("min") might help a little (because the local environment is usually smaller, not to mention that builtins require an extra step over normal globals), but turning it into load_fast(0) helps a lot more.

    But it does mean that if you call locals(), Python has to build a dict on the fly—and that dict won't be updated with any further changes, nor will any changes you make to the dict have any effect on the actual locals. (Usually. If, say, you're executing top-level code, the locals and globals are the same environment, so changing locals does work.)

    That's also why you usually can't usefully exec('x = 2')—that again just creates a dict on the fly with a copy of the locals, then executes the assignment against that dict. (In Python 2, exec was a statement, and it would create a dict on the fly, execute the code, and then try to copy changed values back into the array. Sometimes that worked, but there were tons of edge cases. For example, if you never had a non-exec assignment to x, the compiler couldn't know to put x in co_varnames in the first place.)

    Finally, that means closures can't be implemented as just as stack of dicts (as in many Lisp implementations), but Python has a different optimization for them anyway, as we'll see later.

    Calling

    OK, so now we know everything about how the def statement works (except for the actual bytecode, which I'll get to later, and closure-related stuff, which I'll also get to later). What about the call expression?

    To call a function, all you need to supply is the function object and a list of positional arguments (and a dict of keyword arguments, and, depending on the Python version, *args and/or **kw may be part of the call rather than splatting them into the list/dict in-place… but let's keep it simple for now). What happens inside the interpreter when you do this?

    Well, first, calling a function actually just looks up and calls the function object's __call__ method. That's how you can call class objects, or write classes whose instances are callable. Near the end, we'll get into that. But here, we're talking about calling an actual function object, and the code for that works as follows.

    The first thing it needs to do is create a new stack frame to run the function's code in. As with codes and functions, you can see the members of a frame object in the inspect docs, and you can explore them in the interactive interpreter, and you can find the type as types.FrameType (although, unlike code and function objects, you can't construct one of these from Python). But the basic idea is pretty simple:

    A frame is a code object, an environment, an instruction pointer, and a back-pointer to the calling frame. The environment is a globals dict and a fast locals array, as described earlier. That's it.

    You may wonder how the compiler builds that list of co_varnames in the first place. While it's parsing your code, it can see all the assignments you make. The list of local variables is just the list of parameters, plus the list of names you assign to somewhere in the function body. Anything you access that isn't assigned to anywhere is (ignoring the case of closures, which we'll get to later) is a global; it goes into co_names, and gets looked up as a global (or builtin) at runtime.

    To construct a frame, in pseudocode:
        code = func.__code__
        locals = [_unbound for _ in range(code.co_nlocals)]
        do_fancy_arg_to_param_mapping(locals, args, kwargs)
        frame = Frame()
        frame.f_back = current_frame
        frame.f_lasti = 0
        frame.f_code = code
        frame.f_locals = locals
    
    And then all the interpreter has to do is recursively call interpreter(frame), which runs along until the interpreter hits a return, at which point it just returns to the recursive caller.

    The interpreter doesn't actually need to be recursive; a function call could just be the same as any other instruction in the loop except that it sets current_frame = frame, and then returning would also be the same as any other instruction except that it sets current_frame = current_frame.f_back. That lets you do deep recursion in Python without piling up on the C stack. It makes it easier to just pass frame objects around and use them to represent coroutines, which is basically what Stackless Python is all about. But mainline Python can, and does, already handle the latter just by wrapping frames up in a very simple generator object. Again, see inspect to see what's in a generator, but it's basically just a frame and a couple extra pieces of information needed to handle yield from and exhausted generators.

    Notice the special _unbound value I used in the pseudocode. In fact, at the C level, this is just a null pointer, although it could just as easily be a real sentinel object. (It could even be exposed to the language, like JavaScript's undefined, although in JS, and most other such languages, that seems to cause a lot more problems than it solves.) If you try to access a local variable before you've assigned to it, the interpreter sees that it's unbound and raises an UnboundLocalError. (And if you del a local, it gets reset to _unbound, with the same effect.)

    Defaults

    I glossed over the issues with default values above. Let's look at them now.

    Let's say we defined def spam(x=0):. What would that change?

    First, the body of spam doesn't change at all, so neither does its code object. It still just has an x parameter, which it expects to be filled in by the function-calling machinery in the interpreter, and it doesn't care how. You can dis it and explore its members and nothing has changed.
    Its function object does change, however—__defaults__ now has a value in it.

    If you look at the outer function, its code changes. It has to store the 0 as a constant, and then load that constant to pass to make_function. So the first line of pseudocode goes from this:
        store_local("spam", make_function("spam", spam_body))
    … to this:
        store_local("spam", 
            make_function("spam", spam_body, defaults=(0,)))
    Inside the interpreter's make_function implementation, it just stores that tuple in the created function's __defaults__ attribute.

    At call time, the function-calling machinery is a bit more complicated. In our case, we passed a value for x, so nothing much changes, but what if we didn't? Inside call_function, if the function expects 1 positional parameter, and we passed 0 positional arguments, and we didn't provide a keyword argument matching the name of the function's code's first positional parameter, then it uses the first value from the function's __defaults__ instead, and puts that in the frame's locals array the same way it would for a value we passed in explicitly.

    (If we hadn't set a default value, and then called without any arguments, it would try to use the first value from __defaults__, find there isn't one, and raise a TypeError to complain.)

    This explains why mutable default values work the way they do. At def time, the value gets constructed and stored in the function's __defaults__. Every time the function is called without that argument, nobody constructs a new value, it just copies in the same one from the same tuple every time.

    Closures

    As soon as you have nonlocal variables, things get more fun. So let's start with the most trivial example:
        def eggs():
            y = 1
            def spam(x):
                return x + y
            spam(3)
    
    Our inner function has a free variable, y, which it captures from the outer scope. Its code doesn't change much:
    return(add(load_local("x"), 
            load_local_cell("y").cell_contents))
    But the outer function changes a bit more. In pseudocode:
        load_local_cell("y").set_cell_contents(1)
        store_local("spam", make_function("spam", spam_body, 
            closure=[load_local_cell("y")]))
        call_function(load_local("spam"), [], {})
    The first thing to notice is that y is no longer a normal local variable. In both functions, we're doing a different call, load_local_cell to look it up. And what we get is a cell object, which we have to look inside to get or set the actual value.

    Also notice that when we're calling make_function, we pass the cell itself, not its contents. This is how the inner function ends up with the same cell as the outer one. Which means if either function changes the cell's contents, the other one sees it.

    The only hard part is how the compiler knows that y is a cellvar (a local variable you share with a nested inner function) in eggs and a freevar in spam (a local variable that an outer function is sharing with you, or with a deeper nested function).

    Remember that the compiler scans the code for assignments to figure out what's local. If it finds something that isn't local, then it walks the scope outward to see if it's local to any containing function. If so, that variable becomes a cellvar in the outer function, and a freevar in the inner function (and any functions in between). If not, it becomes a global. (Unless there's an explicit nonlocal or global declaration, of course.) Then the compiler knows which code to emit (local, cell, or global operations) for each variable. Meanwhile, it stores the list of cellvars and freevars for each code object in co_cellvars and co_freevars. When compiling a def statement, the compiler also needs to look at the co_freevars and inserts that closure = [load_load_cell("y")] and passes it along to the make_function.

    Inside call_function, if the code object has any cellvars, the interpreter creates an empty (_unbound) cell for each one. If it has any freevars, the interpreter copies the cells out of the function object's __closure__.

    And that's basically all there is to closures, except for the fast local optimization.

    For historical reasons, the way things get numbered is a little odd. The f_locals array holds the normal locals first, then the cellvars, then the freevars. But cellvars and freevars are numbered starting from the first cellvar, not from the start of the array. So if you have 3 normal locals, 2 cellvars, and 2 freevars, freevar #2 matches slot 0 in co_freevars, and slot 5 in f_locals. Confused? It's probably easier to understand in pseudocode than English. But first…

    For an additional optimization, instead of just one load_fast_cell function, Python has a load_closure that just loads the cell, load_deref that loads the cell's cell_contents in a single step (without having to load the cell object itself onto the stack), and store_deref that stores into the cell's cell_contents in a single step.

    So, the pseudocode to construct a frame looks like this:
        code = func.__code__
        locals = [_unbound for _ in range(code.co_nlocals)]
        cells = [cell() for _ in range(len(code.co_cellvars))]
        do_fancy_arg_to_param_mapping(locals, args, kwargs)
        frame = Frame()
        frame.f_back = current_frame
        frame.f_lasti = 0
        frame.f_code = code
        frame.f_locals = locals + cells + list(func.__closure__)
    
    And the pseudocode for load_closure, load_deref, and store_deref, respectively:
        frame.f_locals[frame.f_code.co_nlocals + i]
        frame.f_locals[frame.f_code.co_nlocals + i].cell_contents
        frame.f_locals[frame.f_code.co_nlocals + i].set_cell_contents(value)
    
    These cells are real Python objects. You can look at a function's __closure__ list, pick out one of the cells, and see its cell_contents. (You can't modify it, however.)

    Example

    It may be worth working through an example that actually relies on closure cells changing:
    def counter():
        i = 0
        def count():
            nonlocal i
            i += 1
            return i
        return count
    count = counter()
    count()
    count()
    
    If you want to go through all the details, you can easily dis both functions (see the section on bytecode below), look at their attributes and their code objects' attributes, poke at the cell object, etc. But the short version is pretty simple.

    The inner count function has a freevar named i. This time, it's explicitly marked nonlocal (which is necessary if you want to rebind it). So, it's going to have i in its co_freevars, and some parent up the chain has to have a cellvar i or the compiler will reject the code; in our case, of course, counter has a local variable i that it can convert into a cellvar.

    So, count is just going to load_deref the freevar, increment it, store_deref it, load_deref it again, and return it.

    At the top level, when we call counter, the interpreter sets up a frame with no locals and one cellvar, so f_locals has one empty cell in it. The i = 0 does a store_deref to set the cell's value. The def does a load_closure to load the cell object, then passes it to make_function to make sure it ends up in the defined function's __closure__, and then it just returns that function.

    When we call the returned function, the interpreter sets up a frame with no locals and one freevar, and copies the first cell out of __closure__ into the freevar slot. So, when it runs, it updates the 0 in the cell to 1, and returns 1. When we call it again, same thing, so now it updates the 1 in the cell to 2, and returns 2.

    Other callable types

    As mentioned earlier, calling a function is really just looking up and calling the function's __call__ method. Of course if calling anything means looking up its __call__ method and calling that, things have to bottom out somewhere, or that would just be an infinite recursion. So how does anything work?

    First, if you call a special method on a builtin object, that also gets short-circuited into a C call. There's a list of slots that a builtin type can provide, corresponding to the special dunder methods in Python. If a type has a function pointer in its nb_add slot, and you call __add__ on an instance of that type, the interpreter doesn't have to look up __add__ in the dict, find a wrapper around a builtin function, and call it through that wrapper; it can just find the function pointer in the slot for the object's type and call it.

    One of those slots is tp_call, which is used for the __call__ method.

    Of course the function type defines tp_call with a C function that does the whole "match the args to params, set up the frame, and call the interpreter recursively on the frame" thing described earlier. (There's a bit of extra indirection so this can share code with eval and friends, and some optimized special cases, and so on, but this is the basic idea.)

    What if you write a class with a __call__ method and then call an instance of it? Well, spam.__call__ will be a bound method object, which is a simple builtin type that wraps up a self and a function. So, when you try to call that bound method, the interpreter looks for its __call__ by calling its tp_call slot, which just calls the underlying function with the self argument crammed in. Since that underlying function will be a normal Python function object (the one you defined with def __call__), its tp_call does the whole match-frame-eval thing, and your __call__ method's code gets run and does whatever it does.

    Finally, most builtin functions (like min) don't create a whole type with a tp_call slot, they're just instances of a shared builtin-function type that just holds a C function pointer along with a name, docstring, etc. so its tp_call just calls that C function. And similarly for methods of builtin types (the ones that aren't already taken care of by slots, and have to get looked up by dict). These builtin function and method implementations get a self (or module), list of args, and dict of kwargs, and have to use C API functions like PyArg_ParseTupleAndKeywords to do the equivalent of argument-to-parameter matching. (Some functions use argclinic to automate most of the annoying bits, and the goal is for most of the core and stdlib to do so.) Beyond that, the C code just does whatever it wants, returning any Python object at the end, and the interpreter then puts that return value on the stack, just as if it came from a normal Python function.

    You can see much of this from Python. For example, if f is a function, f.__call__ is a <method-wrapper '__call__' of function object>, f.__call__.__call__ is a <method-wrapper '__call__' of method-wrapper object>, and if you pile on more .__call__ you just get method-wrappers around method-wrappers with the same method. Similarly, min is a builtin function, min.__call__ is a <method-wrapper '__call__' of builtin_function_or_method object>, and beyond that it's method-wrappers around method-wrappers. But if you just call f or min, it doesn't generate all these wrappers; it just calls the C function that the first method-wrappers is wrapping.

    Actual bytecode

    Real Python bytecode is the machine language for a stack machine with no registers (except for an instruction pointer and frame pointer). Does that sound scary? Well, that's the reason I've avoided it so far—I think there are plenty of people who could understand all the details of how function calls work, including closures, but would be scared off by bytecode.

    But the reality is, it's a very high-level stack machine, and if you've made it this far, you can probably get through the rest. In fact, I'm going to go out of my way to scare you more at the start, and you'll get through that just fine.

    Let's go back to our original trivial example:
        def spam(x):
            return x+1
        spam(3)
    
    Don't put this inside a function, just paste it at the top level of your REPL. That'll get us a global function named spam, and we can look at what's in spam.__code__.co_code:
        b'|\x00d\x01\x17\x00S\x00'
    Well, that's a bit ugly. We can make it a little nicer by mapping from bytes to hex:
        7c 00 00 64 01 00 17 00 00 53 00 00
    But what does that mess mean?

    Bytecode is just a sequence of instructions. Each instruction has a 1-byte number; if it's up to 0x90, it's followed by a 2-byte (little-endian) operand value. So, we can look up 0x7c in a table and see that it's LOAD_FAST, and the 2-byte operand is just 0, just like our pseudocode load_fast(0). So, this is taking the frame's f_locals[0] (which we know is x, because co.varnames[0] is 'x'), and pushing it on the stack.

    Fortunately, we don't have tc do all this work; the dis module does it for us. Just call dis.dis(outer.__code__.co_consts[0]) and you get this:
         0 LOAD_FAST 0 (x)
         3 LOAD_CONST 1 (1)
         6 BINARY_ADD
         7 RETURN_VALUE
    
    The dis docs also explain what each op actually does, so we can figure out how this function works: It pushes local #0 (the value of x) onto the stack, then pushes constant #1 (the number 1) onto the stack. Then it pops the top two values, adds them (which in general is pretty complicated—it needs to do the whole deal with looking up __add__ and __radd__ methods on both objects and deciding which one to try, as explained in the data model docs), and puts the result on the stack. Then it returns the top stack value.

    Brief aside: If operands are only 16 bits, what if you needed to look up the 65536th constant? Or, slightly more plausibly, you needed to jump to instruction #65536? That won't fit in a 16-bit operand. So there's a special opcode EXTENDED_ARG (with number 0x90) that sets its 16-bit operand as an extra (high) 16 bits for the next opcode. So to load the 65536th constant, you'd do EXTENDED_ARG 1 followed by LOAD_FAST 0, and this means LOAD_FAST 65536.

    Anyway, now let's add the outer function back in, and compare the pseudocode to the bytecode:
        def outer():
            def spam(x):
                return x+1
            spam(3)
    
    In pseudocode:
        store_fast(0, make_function("spam", spam_body))
        call_function(load_fast(0), [3], {})
    In real bytecode:
         0 LOAD_CONST 1 (<code object spam at 0x12345678>)
         3 LOAD_CONST 2 ('spam')
         6 MAKE_FUNCTION 0
         9 STORE_FAST 0 (spam)
        12 LOAD_FAST 0 (spam)
        15 LOAD_CONST 3 (3)
        18 CALL_FUNCTION 1
        21 POP_TOP
        22 LOAD_CONST 0 (None)
        25 RETURN_VALUE
    
    So, 0-9 map to the first line of pseudocode: push constant #1, the code object (what we called spam_body), and constant #2, the name, onto the stack. Make a function out of them, and store the result in local #0, the spam variable.

    MAKE_FUNCTION can get pretty complicated, and it tends to change pretty often from one Python version to another, read the dis docs for your version. Fortunately, when you have no default values, annotations, closures, etc., so the pseudocode is just make_function(name, code), you just push the name and code and do MAKE_FUNCTION 0.

    Line 12-18 map to the second line of pseudocode: push local #0 (spam) and constant #3 (the number 3) onto the stack, and call a function.

    Again, CALL_FUNCTION can get pretty complicated, and change from version to version, but in our case it's dead simple: we're passing nothing but one positional argument, so we just put the function and the argument on the stack and do CALL_FUNCTION 1.

    The interpreter's function-call machinery then has to create the frame out of the function object and its code object, pop the appropriate values off the stack, figure out which parameter to match our argument to and copy the value into the appropriate locals array slot, and recursively interpret the frame. We saw above how that function runs. When it returns, that leaves a return value onto the stack.

    Line 21 just throws away the top value on the stack, since we don't want to do anything with the return value from spam.

    Lines 22-25 don't map to anything in our pseudocode, or in our source code. Remember that in Python, if a function that falls off the end without returning anything, it actually returns None. Maybe this could be handled magically by the function-call machinery, but it's not; instead, the compiler stores None in the code object's constants, then adds explicit bytecode to push that constant on the stack and return it.

    By the way, you may have noticed that the bytecode does some silly things like storing a value into slot 0 just to load the same value from slot 0 and then never use it again. (You may have noticed that some of my pseudocode does the same thing.) Of course it would be simpler and faster to just not do that, but Python's limited peephole optimizer can't be sure that we're never going to load from slot 0 anywhere else. It could still dup the value before storing so it doesn't need to reload, but nobody's bothered to implement that. There have been more detailed bytecode optimizer projects, but none of them have gotten very far—probably because if you're serious about optimizing Python, you probably want to do something much more drastic—see PyPy, Unladen Swallow, ShedSkin, etc., which all make it so we rarely or never have to interpret this high-level bytecode instruction by instruction in the first place.

    MAKE_FUNCTION and CALL_FUNCTION

    As mentioned above, these two ops can get pretty complicated. As also mentioned, they're two of the most unstable ops, changing from version to version because of new features or optimizations. (Optimizing function calls is pretty important to overall Python performance.) So, if you want to know how they work, you definitely need to read the dis docs for your particular Python version. But if you want an example (for pre-3.5), here goes.

    Take this pseudocode:
        make_function("spam", spam_body,
            defaults=(1,), kw_defaults={'x': 2},
            closure=(load_closure(0),))
    
    The bytecode is:
         0 LOAD_CONST 6 ((1,))
         3 LOAD_CONST 2 (2)
         6 LOAD_CONST 3 ('x')
         9 BUILD_MAP 1
        12 LOAD_CLOSURE 0 (y)
        15 BUILD_TUPLE 1
        18 LOAD_CONST 4 (<code object spam at 0x12345678>)
        21 LOAD_CONST 5 ("spam")
        15 MAKE_CLOSURE 9
    
    Yuck. Notice that we're doing MAKE_CLOSURE rather than MAKE_FUNCTION, because, in addition to passing a name and code, we're also passing a closure (a tuple of cells). And then we're passing 9 as the operand instead of 0. This breaks down into 1 | (1<<8) | (0<<16), meaning 1 positional default, 1 keyword default, and 0 annotations, respectively. And of course we have to push all that stuff on the stack in appropriate format and order.

    If we'd had an annotation on that x parameter, that would change the operand to 1 | (1<<8) | (1<<16), meaning we'd need EXTENDED_ARG, and pushing the annotations is a few more lines, but that's really about as complicated as it ever gets.

    More info

    If you're still reading, you probably want to know where to look for more detail. Besides the inspect and dis docs, there's the C API documentation for Object Protocol ( PyObject_Call and friends) and Function and Code concrete objects, PyCFunction, and maybe Type objects. Then there's the implementations of all those types in the Objects directory in the source. And of course the main interpreter loop in Python/ceval.c.
    2

    View comments

  3. I've seen a number of people ask why, if you can have arbitrary-sized integers that do everything exactly, you can't do the same thing with floats, avoiding all the rounding problems that they keep running into.

    If we compare the two, it should become pretty obvious at some point within the comparison.

    Integers


    Let's start with two integers, a=42 and b=2323, and add them. How many digits do I need? Think about how you add numbers: you line up the columns, and at worst carry one extra column. So, the answer can be as long as the bigger one, plus one more digit for carry. In other words, max(len(a), len(b)) + 1.

    What if I multiply them? Again, think about how you multiply numbers: you line up the smaller number with each column of the larger number, then there's that extra digit of carry again. So, the answer can be as long as the sum of the two lengths, plus one more. In other words, len(a) + len(b) + 1.

    What if I exponentiate them? Here, things get a bit tricky, but there's still a well-defined, easy-to-compute answer if you think about it, asking how many digits are in a**b is just solving for x in 10**(x-1) = a**b and then rounding up. So, log10 both sides and add one, and x = log10(a**b) + 1 = log10(a) * b + 1. Fit in your variables, and it's log10(42) * 2323 ~= 3770.808, which rounds up to 3771. Try len(str(42**2323)) and you'll get 3771.

    You can come up with other fancy operations to apply to integers--factorials, gcds, whatever--but the number of digits required for the answer is always a simple, easy-to-compute function of the number of digits in the operands.*

    * Except when the answer is infinite, of course. In that case, you easily compute that the answer can't be stored in any finite number of digits and use that fact appropriately--raise an exception, return a special infinity value, whatever.

    Decimals


    Now, let's start with two decimals, a=40 and b=.2323, and add them. How many digits do I need? Well, how many digits do the originals have? It kind of depends on how you count. But the naive way of counting says 2 and 4, and the result, 42.2323 has 6 digits. As you'd suspect, len(a) + len(b) + 1 is the answer here.

    What if I multiply them? At first glance, it seems like it should be easy--our example gives us 9.7566, which has 5 digits; multiplying a by itself is the same as integers, and b by itself for 0.05396329 is just adding 4 decimal digits to 4 decimal digits, so it's still len(a) + len(b) + 1.

    What if I exponentiate them? Well, now things get not tricky, but impossible. 42**.2323 is an irrational number. That means it has an infinite number of digits (in binary, or decimal, or any other integer base) to store. (It also takes an infinite amount of time to compute, unless you have an infinite-sized lookup table to help you.) In fact, most fractional powers of most numbers are irrational--2**0.5, the square root of 2, is the most famous irrational number. This means it takes an infinite number of digits to store.

    And it's not just exponentiation; most of the things you want to do with real numbers--take the sine, multiply by pi, etc.--give you irrational answers. Unless you stick to nothing but addition, subtraction, multiplication, and division, you can't have exact math.

    Even if all you want is addition, subtraction, multiplication, and division: a=1, b=3. How many digits do I need to divide them? Start doing some long division: 1 is smaller than 3, so that's 0. 10 has three 3s in it, so that's 0.3, with 1 left over. 10 has three 3s in it, so that's 0.3 with 1 left over. That's obviously going to continue on forever: there is no way to represent 1 / 3 in decimal without an infinite number of digits. Of course you could switch bases. For example, in base 9, 1 / 3 is 0.3. But then you need infinite digits for all kinds of things that are simple in base 10.

    Fractions


    If all you want actually is addition, subtraction, multiplication, and division, you're dealing with fractions, not decimals. Python's fractions.Fraction type does all of these operations with infinite precision. Of course when you go to print our the results as decimals, they may have to get truncated (otherwise, 1/3 or 1/7 would take forever to print), but that's the only limitation.

    Of course if you try to throw exponentiation or sine at a Fraction, or multiply it by a float, you lose that exactness and just end up with a float.

    Aren't decimals just a kind of fraction, where the denominator is 10d, where d is the number of digits after the decimal point? Yes, they are. But as soon as you, say, divide by 3, the decimal result is a fraction with an infinite denominator, as we saw above so that doesn't do you any good. If you need exact rational arithmetic, you need fractions with arbitrary denominators.

    Accuracy


    In real life, very few values are exact in the first place.* Your table isn't exactly 2 meters long, it's 2.00 +/- 0.005 meters.** Doing "exact" math on that 2 isn't going to do you any good. Doing error-propagating math on that 2.00, however, might.

    Also, notice that a bigger number isn't necessarily more accurate than a smaller one (in fact, usually the opposite), but the simple decimal notation means it has more precision: 1300000000 has 10 digits in it, and if we want to let people know that only the first 3 are accurate, we have to write something like 1300000000 +/- 5000000. And even with commas, like 1,300,000,000 +/- 5,000,000, it's still pretty hard to see how many digits are accurate. In words, we solve that by decoupling the precision from the magnitude: 1300 million, plus or minus 5 million, puts most of the magnitude into the word "million", and lets us see the precision reasonably clearly in "1300 +/- 5". Of course at 13 billion plus or minus 5 million it falls down a bit, but it's still better than staring at the decimal representation and counting up commas and zeroes.

    Scientific notation is an even better way of decoupling the precision from the magnitude. 1.3*1010 +/- 5*106 obviously has magnitude around 1010, and precision of 3-4 digits.*** And going to 1.3*1011 +/- 5*106 is just as readable. And floating-point numbers give us the same benefit.

    In fact, when the measurement or rounding error is exactly half a digit, it gets even simpler: just write 1.30*1010, and it's clear that we have 3 digits of precision, and the same for 1.30*1011. And, while the float type doesn't give us this simplification, the decimal.Decimal type does. In addition to being a decimal fraction rather than a binary fraction, so you can think in powers of 10 instead of 2, it also lets you store 1.3e10 and 1.30e10 differently, to directly keep track of how much precision you want to store. It can also give you the most digits you can get out of the operation when possible--so 2*2 is 4, but 2.00*2.00 is 4.0000. That's almost always more than you want (depending on why you were multiplying 2.00 by 2.00, you probably want either 4.0 or 4.00), but you can keep the 4.0000 around as an intermediate value, which guarantees that you aren't adding any further rounding error from intermediate storage. When you perform an operation that doesn't allow that, like 2.00 ** 0.5, you have to work out for yourself how much precision you want to carry around in the intermediate value, which means you need to know how to do error propagation--but if you can work it out, decimal can let you store it.

    * Actually, there are values that can be defined exactly: the counting numbers, e, pi, etc. But notice that most of the ones that aren't integers are irrational, so that doesn't help us here. But look at the symbols section for more...

    ** If you're going to suggest that maybe it's exactly 2.0001790308123812082 meters long: which molecule is the last molecule of the table? How do you account for the fact that even within a solid, molecules move around slowly? And what's the edge of a molecule? And, given that molecules' arms vibrate, the edge at what point in time? And how do you even pick a specific time that's exactly the same across the entire table, when relativity makes that impossible? And, even if you could pick a specific molecule at a specific time, its edge is a fuzzy cloud of electron position potential that fades out to 0.

    *** The powers are 10 and 6, so it's at worst off by 4 digits. But the smaller one has a 5, while the bigger one has a 1, so it's obviously a lot less than 4 digits. To work out exactly how many digits it's off, do the logarithm-and-round trick again.

    Money


    Some values inherently have a precision cutoff. For example, with money, you can't have less than one cent.* In other words, they're fixed-point, rather than floating-point, values.

    The decimal module can handle these for you as well. In fact, money is a major reason there's a decimal standard, and implementations of that standard in many languages' standard libraries.***

    * Yes, American gas stations give prices in tenths-of-a-cent per gallon, and banks transact money in fractional cents, but unless you want to end up in Superman 3,** you can ignore that.

    ** And yes, I referenced Superman 3 instead of Office Space. If you're working in software in 2015 and haven't seen Office Space, I don't know what I could say that can help.

    *** For some reason, people are willing to invest money in solving problems that help deal with money.

    Symbols


    So, how do mathematicians deal with all of this in real life? They don't. They do math symbolically, rather than numerically. The square root of 2 is just the square root of 2. And you carry it around that way throughout the entire operation. Multiply 3 * sqrt(2) and the answer is 3 * sqrt(2). But multiply sqrt(2) * sqrt(2) and you get 2, and multiply sqrt(2) * sqrt(3) and you get sqrt(6), and so on. There are simplification rules that give you exact equivalents, and you can apply these as you go along, and/or at the end, to try to get things as simple as possible. But in the end, the answer may end up being irrational, and you're just stuck with sqrt(6).

    Sometimes you need a rough idea of how big that sqrt(6) is. When that happens, you know how rough you want it, and you can calculate it to that precision. To three digits, more than enough to give you a sense of scale, it's 2.45. If you need a pixel-precise graph, you can calculate it to +/- half a pixel. But the actual answer is sqrt(6), and that's what you're going to keep around (and use for further calculation).

    In fact, let's think about that graph in more detail. For something simple and concrete, let's say you're graphing radii vs. circumferences of circles, measured in inches, on a 1:1 scale, to display on a 96 pixels-per-inch screen. So, a circle with radius 3" has a diameter of 18.850" +/- half a pixel. Or, if you prefer, 1810 pixels. But now, let's say your graph is interactive, and the user can zoom in on it. If you just scale that 1810 pixels up at 10:1, you get 18100 pixels. But if you stored 6*pi and recalculate it at the new zoom level, you get 18096 pixels. A difference of 4 pixels may not sound like much, but it's enough to make things look blocky and jagged. Zoom in too much more, and you're looking at the equivalent of face-censored video from Cops.

    Python doesn't have anything built-in for symbolic math, but there are some great third-party libraries like SymPy that you can use.
    2

    View comments

  4. In a recent thread on python-ideas, Stephan Sahm suggested, in effect, changing the method resolution order (MRO) from C3-linearization to a simple depth-first search a la old-school Python or C++. I don't think he realized that's what he was proposing at first, but the key idea is that he wants Mixin to override A in B as well as overriding object in A in this code:

    class Mixin: pass
    class A(Mixin): pass
    class B(Mixin, A): pass
    

    In other words, the MRO should be B-Mixin-A-Mixin-object. (Why not B-Mixin-object-A-Mixin-object? I think he just didn't think that through, but let's not worry about that.) After all, why would he put Mixin before A if he didn't want it to override A in B? And why would he attach Mixin to A if he didn't want it to override object in A?

    Well, that doesn't actually work. The whole point of linearization is that each class appears only once in the MRO, and many feature of Python--including super, which he wanted to make extensive use of--depend on that. For example, with his MRO, inside Mixin.spam, super().spam() is going to call A.spam, and its super().spam() is going to call Mixin.spam(), and you've obviously got a RecursionError on your hands.

    I think ultimately, his problem is that what he wants isn't really a mixin class (in typical Python terms--in general OO programming terms, it's one of the most overloaded words you can find...). For example, a wrapper class factory could do exactly what he wants, like this:

    def Wrapper(cls): return cls
    class A(Wrapper(object)): pass
    class B(Wrapper(A)): pass
    

    And there are other ways to get where he wants.

    Anyway, changing the default MRO in Python this way is a non-starter. But if he wants to make that change manually, how hard is it? And could he build a function that lets his classes could cooperate using that function instead of super?

    Customizing MRO


    The first step is to build the custom MRO. This is pretty easy. He wants a depth-first search of all bases, so:

    [cls] + list(itertools.chain.from_iterable(base.__mro__ for base in cls.__bases__))
    

    Or, if leaving the extra object out was intentional, that's almost as easy:

    [cls] + list(itertools.chain.from_iterable(base.__mro__[:-1] for base in cls.__bases__)) + [object]
    

    But now, how do we get that into the class's __mro__ attribute?

    It's a read-only property; you can't just set it. And, even if you could, you need type.__new__ to actually return something for you to modify--but if you give it a non-linearizable inheritance tree, it'll raise an exception. And finally, even if you could get it set the way you want, every time __bases__ is changed, __mro__ is automatically rebuilt.

    So, we need to hook the way type builds __mro__.

    I'm not sure if this is anywhere in the reference documentation or not, but the answer is pretty easy: the way type builds __mro__ is by calling its mro method. This is treated as a special method (that part definitely isn't documented anywhere), meaning it's looked up on the class (that is, the metaclass of the class being built) rather than the instance (the class being built), doesn't go through __getattribute__, etc., so we have to build a metaclass.

    But once you know that, it's all trivial:

    class MyMRO(type): 
        def mro(cls): 
            return ([cls] + 
                    list(itertools.chain.from_iterable(base.__mro__[1:] for base in cls.__bases__)) +
                    [object])
    
    class Mixin(metaclass=MyMRO): pass
    class A(Mixin): pass
    class B(Mixin, A): pass
    

    And now, B.__mro__ is B-Mixin-A-Mixin-object, exactly as desired.

    For normal method calls, this does what the OP wanted: Mixin gets to override A.

    But, as mentioned earlier, it obviously won't enable the kind of super he wants, and there's no way it could. So, we'll have to build our own replacement.

    Bizarro Super


    If you want to learn how super works, I think the documentation in Invoking Descriptors is complete, but maybe a little terse to serve as a tutorial. I know there's a great tutorial out there, but I don't have a link, so... google it.

    Anyway, how super works isn't important; what's important is the define what we want here. Once we actually know exactly what we want, anything is possible as long as you believe, that's what science is all about.

    Since we're defining something very different from super but still sort of similar, the obvious name is bizarro.

    Now, we want a call to bizarro().spam() inside B.spam to call Mixin.spam, a call inside Mixin.spam to call A.spam, a call inside A.spam to call Mixin.spam, and a call inside Mixin.spam to call object.spam.

    The first problem is that calling object.spam is just going to raise an AttributeError. Multiple inheritance uses of super are all about cooperative class hierarchies, and part of that cooperation is usually that the root of your tree knows not to call super. But here, Mixin is the root of our tree, but it also appears in other places on the tree, so that isn't going to work.

    Well, since we're designing our own super replacement, there's no reason it can't also cooperate with the classes, instead of trying to be fully general. Just make it return a do-nothing function if the next class is object, or if the next class doesn't have the method, or if the next class has a different metaclass, etc. Pick whatever rule makes sense for your specific use case. Since I don't have a specific use case, and don't know what the OP's was (he wanted to create a "Liftable" mixin that helps convert instances of a base class into instances of a subclass, but he didn't explain how he wanted all of the edge cases to work, and didn't explain enough about why he wanted such a thing for me to try to guess on his behalf), I'll go with the "doesn't have the method".

    While we're at it, we can skip over any non-cooperating classes that end up in the MRO (which would obviously be important if we didn't block object from appearing multiple times--but even with the MRO rule above, you'll have the same problem if your root doesn't directly inherit object).

    The next problem--the one that's at the root of everything we're trying to work around here--is that we want two different things to happen "inside Mixin.spam", depending on whether it's the first time we're inside or the second. How are we going to do that?

    Well, obviously, we need to keep track of whether it's the first time or the second time. One obvious way is to keep track of the index, so it's not A.spam if we're in Mixin.spam or object.spam if we're in Mixin.spam, it's B.__mro__[2] if we're in B.__mro__[1], and B.__mro__[4] if we're in B.__mro__[3]. (After first coding this up, I realized that an iterator might be a lot nicer than an index, so if you actually need to implement this yourself, try it that way. But I don't want to change everything right now.)

    How can we keep track of anything? Make the classes cooperate. Part of the protocol for calling bizarro is that you take a bizarro_index parameter and pass it into the bizarro call. Let's make it as an optional parameter with a default value of 0, so your users don't have to worry about it, and make it keyword-only, so it doesn't interfere with *args or anything. So:

    class Mixin(metaclass=MyMRO):
        def doit(self, *, bizarro_index=0):
            print('Mixin')
            bizarro(Mixin, self, bizarro_index).doit()
    
    class A(Mixin):
        def doit(self, *, bizarro_index=0):
            print('A')
            bizarro(A, self, bizarro_index).doit()
    
    class B(Mixin, A):
        def doit(self, *, bizarro_index=0):
            print('B')
            bizarro(B, self, bizarro_index).doit()
    

    And now, we just have to write bizarro.

    The key to writing something like super is that it returns a proxy object whose __getattribute__ looks in the next class on the MRO. If you found that nice tutorial on how super works, you can start with the code from there. We then have to make some changes:

    1. The way we pick the next class has to be based on the index.
    2. Our proxy has to wrap the function up to pass the index along.
    3. Whatever logic we wanted for dealing with non-cooperating classes has to go in there somewhere.

    Nothing particularly hard. So:

    def bizarro(cls, inst, idx): 
        class proxy: 
            def __getattribute__(self, name): 
                for superidx, supercls in enumerate(type(inst).__mro__[idx+1:], idx+1): 
                    try:
                        method = getattr(supercls, name).__get__(inst) 
                    except AttributeError: 
                        continue 
                    if not callable(method):
                        return method # so bizarro works for data attributes
                    @functools.wraps(method) 
                    def wrapper(*args, **kw):
                        return method(*args, bizarro_index=superidx, **kw)
                    return wrapper 
                return lambda *args, **kw: None 
        return proxy() 
    

    And now, we're done.

    Bizarro am very beautiful


    In Python 3, super(Mixin, self) was turned into super(). This uses a bit of magic, and we can use the same magic here.

    Every method has a cell named __class__ that tells you which class it's defined in. And every method takes its self as the first parameter. So, if we just peek into the caller's frame, we can get those easily. And while we're peeking into the frames, since we know the index has to be the bizarro_index parameter to any function that's going to participate in bizarro super-ing, we can grab that too:

    def bizarro():
        f = sys._getframe(1)
        cls = f.f_locals['__class__']
        inst = f.f_locals[f.f_code.co_varnames[0]]
        idx = f.f_locals['bizarro_index']
        # everything else is the same as above
    

    This is cheating a bit; if you read PEP 3135, the super function doesn't actually peek into the parent frame; instead, the parser recognizes calls to super() and changes them to pass the two values. I'm not sure that's actually less hacky--but it is certainly more portable, because other Python implementations aren't required to provide CPython-style frames and code objects. Also leaving the magic up to the parser means that, e.g., PyPy can still apply its no-frames-unless-needed optimization, trading a tiny bit of compile-time work for a small savings in every call.

    If you want to do the same here, you can write an import hook that AST-transforms bizarro calls in the same way. But I'm going to stick with the frame hack.

    Either way, now you can write this:

    class Mixin(metaclass=MyMRO):
        def doit(self, *, bizarro_index=0):
            print('Mixin')
            bizarro().doit()
    
    class A(Mixin):
        def doit(self, *, bizarro_index=0):
            print('A')
            bizarro().doit()
    
    class B(Mixin, A):
        def doit(self, *, bizarro_index=0):
            print('B')
            bizarro().doit()
    

    Meanwhile, notice that we don't actually use cls anywhere anyway, so... half a hack is only 90% as bad, right?

    But still, that bizarro_index=0 bit. All that typing. All that reading. There's gotta be a better way.

    Well, now you can!

    We've already got a metaclass. We're already peeking under the covers. We're already wrapping functions. So, let's have our metaclass peek under the covers of all of our functions and automatically wrap anything that uses bizarro to take that bizarro_index parameter. The only problem is that the value will now be in the calling frame's parent frame (that is, the wrapper), but that's easy to fix too: just look in f_back.f_locals instead of f_locals.

    import functools
    import itertools
    import sys
    
    class BizarroMeta(type):
        def mro(cls):
            return ([cls] +
                    list(itertools.chain.from_iterable(base.__mro__[:-1] for base in cls.__bases__)) +
                    [object])
        def __new__(mcls, name, bases, attrs):
            def _fix(attr):
                if callable(attr) and 'bizarro' in attr.__code__.co_names:
                    @functools.wraps(attr)
                    def wrapper(*args, bizarro_index=0, **kw):
                        return attr(*args, **kw)
                    return wrapper
                return attr
            attrs = {k: _fix(v) for k, v in attrs.items()}
            return super().__new__(mcls, name, bases, attrs)
    
    def bizarro():
        f = sys._getframe(1)
        inst = f.f_locals[f.f_code.co_varnames[0]]
        idx = f.f_back.f_locals['bizarro_index']
        class proxy: 
            def __getattribute__(self, name): 
                for superidx, supercls in enumerate(type(inst).__mro__[idx+1:], idx+1):
                    try:
                        method = getattr(supercls, name).__get__(inst)
                    except AttributeError: 
                        continue 
                    if not callable(method):
                        return method # so bizarro works for data attributes
                    @functools.wraps(method) 
                    def wrapper(*args, **kw): 
                        return method(*args, bizarro_index=superidx, **kw)
                    return wrapper 
                return lambda *args, **kw: None 
        return proxy() 
    
    class Mixin(metaclass=BizarroMeta):
        def doit(self):
            print('Mixin')
            bizarro().doit()
    
    class A(Mixin):
        def doit(self):
            print('A')
            bizarro().doit()
    
    class B(Mixin, A):
        def doit(self):
            print('B')
            bizarro().doit()
    
    B().doit()
    

    Run this, and it'll print B, then Mixin, then A, then Mixin.

    Unless I made a minor typo somewhere, in which case it'll probably crash in some way you can't possibly debug. So you'll probably want to add a bit of error handling in various places. For example, it's perfectly legal for something to be callable but not have a __code__ member--a class, a C extension function, an instance of a class with a custom __call__ method... Whether you want to warn that you can't tell whether Spam.eggs uses bizarro or not because you can't find the code, assume it doesn't and skip it, assume it does and raise a readable exception, or something else, I don't know, but you probably don't want to raise an exception saying that type objects don't have __code__ attributes, or whatever comes out of this mess by default.

    Anyway, the implementation is pretty it's pretty small, and not that complicated once you understand all the things we're dealing with, and the API for using it is about as nice as you could want.

    I still don't know why you'd ever want to do this, but if you do, go for it.
    1

    View comments

  5. Note: This post doesn't talk about Python that much, except as a point of comparison for JavaScript.

    Most object-oriented languages out there, including Python, are class-based. But JavaScript is instead prototype-based. Over the years, this has led to a lot of confusion, and even more attempts to resolve that confusion, either by faking classes, or by explaining why prototypes are good. Unfortunately, most of the latter attempts are completely wrong, and only add to the confusion.

    A mess?

    I'm going to pick on Kyle Simpson's JS Objects: Inherited a Mess (and its two followups)—not because it's particularly bad, but for the opposite reason: it's one of the best-written attempts at clearing things up (and it's frequently linked to).

    Simpson's main point is that, ultimately, prototype inheritance is more like delegation than like inheritance, because the inheritance "arrows" go in the opposite direction.

    Let's take a language like C++, and write a class Foo, a subclass Bar, and a Bar instance bar1. There's a superclass-to-subclass arrow pointing from Foo to Bar, which represents Bar's vtable being made by copying Foo's, and a class-to-instance arrow pointing from Bar to bar1, representing bar1's vptr being copied from Bar.

    In JavaScript, on the other hand, there's no such copying; instead, there are live links. Methods are looked up on the object; if not found, the link is followed to look up the method on its prototype, and so on up the chain. So, he draws an arrow from bar1 to Bar.prototype, which its prototype link, and an arrow from Bar.prototype to Foo.prototype, representing the same thing.

    But he's drawing entirely the wrong distinction. When you notice that C++, Java, and C# all do things one way, but JavaScript does things differently, it's a decent guess that the difference is prototypes vs. classes. But as soon as you toss in a few other languages, it's obvious that the guess is wrong. For example, Python and Smalltalk do things the same way as JavaScript, but they're very definitely class-based languages.

    Simpson brings up some other differences between JS and other languages in the rest of the series--e.g., the fact that a JS object's constructor doesn't automatically call its prototype's constructor, while a C++ class's constructor does automatically call its base classes' constructors. But here again, Python does the same thing as JS. At any rate, the direction of the arrows is what he focuses on, so let's stick with that.

    Dynamic vs. semi-dynamic lookup

    If this difference isn't about prototypes vs. classes, what is it?

    In Python, method lookup is dynamic. If your source code has a call to foo.spam, the compiler turns that into code that does the following, until one of them works:*
    • Ask the object to handle "spam".
    • Ask the object's __class__ to handle "spam".
    • Walk the class's superclass tree (in C3-linearized order), asking each one in turn to handle "spam".
    • Raise an AttributeError.
    In JavaScript, method lookup is dynamic in exactly the same way. The compiler turns a call to foo.spam into this:
    • Ask the object to handle "spam".
    • Walk the object's prototype list, asking each one in turn to handle "spam".
    • Raise a TypeError.
    The only difference is that Python has two kinds of links--instance-to-class and class-to-superclass--that JavaScript collapses into one--object-to-prototype. (And that Python has a linearized multiple inheritance tree, but let's ignore that here.)

    In C++, method lookup is mostly static, with a hook for a very specific kind of dynamism. The compiler turns a call to foo.spam into this:**
    • Look up foo.vptr[2] and call it.
    The compiler has to know that foo is (a reference to) an instance of Foo or some subclass thereof, and it has to have compiled Foo already so that it knows that spam is its third method. The only way anyone outside of Foo can affect what happens here is that a subclass like Bar can override the spam method, so its vtable ends up with a different function pointer in the third slot. If so, if foo is a Bar instance at runtime, then foo.vptr[2] will point to Bar.spam instead of Foo.spam. That's all the late binding you get.

    The downsides to C++-style virtual methods instead of fully dynamic lookup are pretty obvious. In JavaScript or Python, you can set a new foo.spam at any time, overriding the behavior of Foo. Or you can set a new Foo.spam, which will affect all existing instances of Foo or its descendants that don't have any overriding implementation. This can be pretty handy for, e.g., building up transparent proxies based on runtime information, or mocking objects for unit tests.

    But you don't do those things that often. Pretty often, C++'s dynamic behavior is all the dynamic behavior you need. And it's obviously simpler, and a lot faster, to just follow a pointer, index an array, and follow another pointer than to walk through a linked chain of objects asking each one to do a dictionary lookup for you. And it even opens the door to further optimizations--e.g., if the compiler knows that foo must be exactly a Foo rather than a subclass (e.g., because it's a local allocated on the stack), or it knows that no subclass of Foo overrides the method (e.g., because it's got feedback from a whole-program analyzer), it can just insert a call directly to the Foo.spam implementation.

    The problem is that something that works "pretty often" but has no escape hatch only works pretty often. And just as often, the speed of the lookups don't matter anyway. As Alan Kay pointed out back in the 70s, usually you're either doing all your work inside some inner loop so it hardly matters how fast you get to that loop, or spending all your time waiting on a user to interact with the GUI/a client to interact with the reactor/etc. so it hardly matters how fast you do anything. Better to give people features they might need than performance they probably don't. So, that's what Smalltalk did. And Objective-C, Python, Ruby, etc.

    * Python offers a wide variety of ways to hook this process, as do many other dynamic OO languages, so this is only the default behavior. But let's not worry about that right now.
    ** I'm ignoring non-virtual functions here, but they don't add any complexity. I'm also ignoring the screwy way that C++ does multiple inheritance. Be glad. I'm also hiding reference/pointer stuff, which means this is closer to C# or Java than C++, but that doesn't really matter here.

    The best of both worlds?

    A Smalltalk offshoot called Self was built around the idea that a just-in-time (JIT) optimizer can eliminate most of the costs of dynamic lookup, while preserving all the benefits. It's pretty uncommon that you change out a method at runtime, and very rare that you do so more than once or twice in the run of a program. So, if you cache the lookup and recompile any function that needs the lookup to use the cached value, and insert some guards to invalidate the cache on the rare occasions when you change something, you may end up with millions or billions of faster-than-C++ lookups for every one recompile and slow lookup.

    It's not coincidental that Self is also the language that popularized prototype-based inheritance. One major thing classes add in dynamic inheritance is that making that first link special makes it easier to optimize. Also, classes of objects tend to almost always override the same set of methods in the same way, so you might as well label those classes to help the optimizer. And if the language encourages you to think in terms of classes of objects, you'll usually make the optimizer's job easier. But with a smart-enough JIT, that isn't necessary. At any rate, the dynamic lookup was the main reason for the JIT, and that reason applies to dynamic class-based languages just as much as to prototype-based ones.

    Plus, of course, it turns out that the idea of a JIT worked pretty well on static code too; Hotshot running compiled Java bytecode is generally faster than PyPy or V8 running Python or JavaScript. But for most uses, PyPy and V8 are more than fast enough. (In fact, for many uses, CPython and Spidermonkey are more than fast enough.)

    What's in a name?

    Simpson wants to insist that JavaScript's behavior is not dynamic inheritance, it's delegation.

    But, if so, Python, Smalltalk, and many other class-based languages aren't doing dynamic inheritance either. So, the difference between class-based and prototype-based still has nothing to do with delegation vs. inheritance, even if you use his terms.

    In fact, what he's calling "delegation" really just means dynamic inheritance. Which makes it obvious why claiming that JavaScript doesn't do dynamic inheritance, it does delegation, is wrong.

    While we're on the subject of names, you could argue that the confusion about prototypes is from the start caused by other things JS got wrong. Other prototype-based languages, from Self to newer languages like IO, don't use constructor functions and a Java-like new operator, so they don't make prototyping/cloning look misleadingly like instantiation. Maybe if JS didn't do that, people wouldn't have been misled as badly in the first place?

    Prototypes

    So then, what is the difference between classes and prototypes?

    Of course there's the minor difference noted above, that JavaScript only has one kind of link instead of two.* And, if you read the footnotes, there could be a performance difference. But... so what?

    Well, getting rid of classes does lose something: In JavaScript, your prototype chain doesn't affect your type. In Python, because the first step of that chain is special, the language can define its notion of type in terms of that link. Of course 95% of your code is duck-typed and EAFP-driven and this doesn't matter. But for that last 5% (and for debugging), you have a coherent and useful notion of type, which can help.

    Also, making multiple inheritance work with classes is not that hard (even though C++ gets it wrong, Java punts on it, etc.); making it work with prototypes is harder.

    The biggest downside is that cooperative inheritance through a super-like mechanism is hard with prototypes. In particular, composing mixins that don't know about each other is a serious problem. Simpson gets into this in his "Polymorphism redux" section in his third part. But I believe this is actually only a problem if you use the constructor idiom instead of using prototype-chained object literals, so a prototype-based language that didn't encourage the former and make the latter more verbose than necessary may not have this problem.

    Anyway, what do you gain in return for these negatives? One thing (although not the motivation behind Self**) is flexible object literals. In JavaScript, creating an object is just like creating a dictionary. If some of the members are callable, they're methods. Sure, this is just syntactic sugar, but it's syntactic sugar around something you often want.

    At first glance, that sounds horrible. We have 30000 objects, all anonymous, all locally created in different calls to the same function and therefore closing over different scopes… so much duplication! But again, you're thinking about ahead-of-time optimization; there's no reason the interpreter can't realize that all of those closures have the same body, and guardably-almost-always the same behavior, so with a JIT, you're wasting very little space and even less time. And, even if that weren't true, it probably wouldn't matter anyway. Meanwhile, if you don't want that duplication, you can always move some of those functions into a prototype—exactly the same way as you'd move them into a class in Python.

    At second glance, it sounds horrible again. Everyone knows that closures and objects are dual—one is the functional way of encapsulating state, the other is the OO way. So, who wants closures inside objects? But here... well, a Python developer shouldn't be surprised that sometimes practicality beats purity, and adding boilerplate can hurt readability. For a locally-created object that just has a bunch of simple functions that all refer to the local variable username, does it actually make things more readable to create and call a constructor that copies username to self.username and make all those functions call self.username, or is that just boilerplate?

    (In fact, you can go farther and explicitly link up multiple closures through manual delegation chains. But I haven't seen many uses of inheritance chains beyond 2-deep in JavaScript, and nearly all of those that I've seen are equivalent to what you'd do in a class-based language. So, let's just stick with the simple case here.)

    Really, it's similar to def vs. lambda: for most functions, def is clearly better, but that doesn't mean there's never a good use for lambda. Nobody needs a separate statement, and an explicit name, for the function lambda x: x+1 or lambda cb: cb(result). And similarly, for most objects, using methods in the class or prototype to encapsulate the object's state is clearly better, but that doesn't mean there's never a good use for object literals with object-specific methods.

    The big difference is that in JavaScript, both ways are easy; in Python, both ways are possible, but only one is easy—there's no clean and simple way to put the behavior in each object.

    Well, that's not quite true; JS's horrible scoping rules and its quirky this mean both ways are actually not easy. But that's not because JS is prototype-based; it's because JS got some other things wrong.

    Anyway, the differences in this section are what really matter—and they're the ones that are down to JS being prototype-based.

    * Or, if you count metaclass-class as different from class-instance, one instead of three. Simulating metaclasses in a prototype language usually means making use of the distinction between calling the prototype vs. just referencing its .prototype, in JS terms, and there's nothing to protect you from mixing up your "metaprototypes" and prototypes. But then Python has nothing to protect you from inheriting from a metaclass as if it were a normal class either, and that never causes problems in real-life code.
    ** The inspiration behind prototypes in Self was that it's often hard to define a class hierarchy in advance, and too hard to modify a class hierarchy after you've already written a bunch of code around it. Slightly oversimplifying, prototypes were mainly an attempt to solve the fragile base class problem.

    The best of both... wait, deja-vu

    Some newer class-based languages, like Scala, have an object-literal and/or singleton-defining syntax. (And it's not that hard to add that to Python, if you want.) If you combine this with a few other features, you can get objects that act like instances and clones at the same time (and hopefully in the ways you want, not the ways you don't).

    On the other hand, JS (ES6) is adding classes for when you want them, but object literals will still an anonymously-typed clone of an optionally-specific prototype rather than an instance of a class.

    Are we meeting in the middle? I'm not sure.
    1

    View comments

  6. About a year and a half ago, I wrote a blog post on the idea of adding pattern matching to Python.

    I finally got around to playing with Scala semi-seriously, and I realized that they pretty much solved the same problem, in a pretty similar way to my straw man proposal, and it works great. Which makes me think that it's time to revisit this idea. For one thing, using Scala terminology that people may be familiar with (and can look up) can't hurt. For another, we can see whether any of the differences between Python and Scala will make the solution less workable for Python.

    Extractors


    In my post, I described the solution as a matching protocol. A class that wants to be decomposable by pattern matching has to implement a method __match__, which returns something like a tuple of values, or maybe an argspec of values and optional values, that the pattern-matching code can then match against and/or bind to the components of a pattern.

    For simple cases, this could be automated. Both namedtuple and Namespace could define a __match__, and we could write decorators to examine the class in different ways--__slots__, propertys, copy/pickle, or even runtime introspection--and build the __match__ for you. But when none of these simple cases apply, you have to implement it yourself.

    In Scala, the equivalent is the extractor protocol. A class that wants to be decomposable by pattern matching supplies an associated extractor (normally a singleton object, but that isn't important here), which has an unapply method that returns an optional value or tuple of values, as well as an optional apply method that constructs instances of the class from the same arguments. A case class (a simple kind of type that's basically just a record) automatically gets an extractor built for it; otherwise, you have to build it yourself.

    In the simplest case, this is no different from just using __match__ for unapply and, of course, the existing __init__ for apply. But it's actually a lot more flexible.

    • You can write multiple extractors that all work on the same type.
    • You can add an extractor for an existing type, without having to monkeypatch the type. For example, you could extract the integer and fraction parts of a float.
    • You can write an extractor that doesn't match all values of the type.

    Example


    Daniel Westheide's great Neophyte's Guide to Scala shows some of this in action. I'll borrow one of his examples, with minor variations:

    trait User {
      def name: String
      def score: Int
    }
    class FreeUser(val name: String, val score: Int, val upgradeProbability: Double)
      extends User
    class PremiumUser(val name: String, val score: Int) extends User
    
    object FreeUser {
      def unapply(user: FreeUser): Option[(String, Int, Double)] =
        Some((user.name, user.score, user.upgradeProbability))
    }
    object PremiumUser {
      def unapply(user: PremiumUser): Option[(String, Int)] = Some((user.name, user.score))
    }
    
    val user: User = new FreeUser("Daniel", 3000, 0.7d)
    
    user match {
      case FreeUser(name, _, p) =>
        if (p > 0.75) name + ", what can we do for you today?" else "Hello " + name
      case PremiumUser(name, _) => "Welcome back, dear " + name
    }
    

    If you're not used to Scala's syntax, this might be a bit hard to read, but it's not too hard to muddle through. A trait is like an ABC, a class is like a normal class, and an object defines a singleton class and the one instance of it. So:

    First, we define a class hierarchy: FreeUser and PremiumUser are both subclasses of the abstract base class User, and FreeUser adds a new attribute.

    Next, we define two extractors. The fact that they're named FreeUser and PremiumUser, just like the classes, is convenient for readability, but it's the type annotations on their unapply methods that makes them actually work on those types respectively.

    Then we construct a User, who happens to be a free user with an 0.7 probability of upgrading.

    Then we pattern-match that user, first as a free user, then as a premium user if that fails. Here, the syntax works like my proposal, except for minor spelling differences (match for case, case for of, braces for indents, => for :). But, instead of checking isinstance(user, FreeUser) and then calling instance.__match__ and binding to name, _, p, Scala tries to call FreeUser.unapply(user), which works, and binds to name, _, p. The result is the same, but the way it gets there is a little more flexible (as we'll see).

    And then, inside the free user case, we just do an if-else, so the result is "Hello Daniel".

    Another example


    In case it isn't obvious what the optional-value bit is for, here's another example that makes use of that, as well as applying an extractor to a builtin type without having to monkeypatch it. This is a pretty silly example, but it's in the official Scala tutorial, so...

    object Twice {
        def unapply(x: Int): Option[Int] = if (x%2 == 0) Some(x/2) else None
    }
    
    val i: int = 42
    i match {
      case Twice(n) => n
      _ => -1
    }
    

    Here, Twice.unapply(42) returns Some(21), so the case Twice(n) will match, binding n to 21, which we'd return.

    But if we tried the same thing with 23, then Twice.unapply(23) returns None, so the case Twice(n) would not match, so we'd hit the default case, so we'd return -1.

    As for the optional values, I'll get to that below.

    Can we do this in Python?


    Some of the details obviously aren't going to fit into Python as-is.

    We could easily add a @singleton class decorator that lets us make the class anonymous and return a singleton instance pretty easily. But we still can't use the same name as the class without a conflict in Python (the instance would just unbind the type). And really, do we need to define a class here just to wrap up a function in a method (which is, or might as well be, a @staticmethod, since it doesn't touch self, which has no state to touch anyway)? I think it would be more pythonic to just have decorated functions. (If you disagree, it's pretty trivial to change.)

    Python normally uses dynamic (per-instance) attributes on classes. You can use __slots__ or @property, and you can even define an ABC that makes those abstract, matching Scala's traits, but I think it's more Pythonic to just skip that here. (I'm pretty sure it's orthogonal to everything related to pattern matching and extracting; it just means our most Pythonic equivalent example isn't exactly like the Scala example.)

    Python doesn't have optional types. It does have "implicit nullable types" by just returning None, but since None is often a valid value, that doesn't do any good. Of course the answer here is exceptions; everywhere that Scala returns Option[T], Python returns T or raises, so why should this be any different?

    In my previous proposal, we used AttributeError (or a new subclass of it, MatchError) to signal a failed match (partly because you get that automatically if __match__ doesn't exist), so let's stay consistent here. We can even make a special unapply_simple decorator that lets us return None and have it turned into an MatchError when we know it isn't a valid value.

    Finally, Python doesn't have static typing. Is that necessary here? Let's try it and see.

    So, using my earlier case syntax:

    class User:
        pass
    
    class FreeUser(User):
        def __init__(self, name: str, score: int, upgrade_probability: float):
            self.name, self.score, self.upgrade_probability = name, score, upgrade_probability
    
    class PremiumUser(User):
        def __init__(self, name: str, score: int):
            self.name, self.score = name, score
    
    @unapply
    def free_user(user):
        return user.name, user.score, user.upgrade_probability
    
    @unapply
    def premium_user(user):
        return user.name, user.score
    
    user = FreeUser("Daniel", 3000, 0.7d)
    
    case user:
        of free_user(name, _, p):
            if p > 0.75:
                return '{}, what can we do for you today?'.format(name)
            else:
                return 'Hello {}'.format(name)
        of premium_user(name, _):
            return 'Welcome back, dear {}'.format(name)
    
    @unapply_simple
    def twice(x):
        if x%2 == 0: return (x/2,)
    
    i = 42
    case i:
        of twice(n): print(n)
        else: print(-1)
    

    There are a few problems here, but I think they're easily solvable.

    My earlier proposal matched Breakfast(0, eggs, _) by first checking isinstance(breakfast, Breakfast), and then calling Breakfast.__match__; here, there's no type-checking at all.

    As it turns out, that free_user will fail to match a PremiumUser because user.upgrade_probability will raise an AttributeError--but the other way around would match just fine. And we don't want it to. But with nothing but duck typing, there's no way that Python can know that premium_user shouldn't match a FreeUser. (The only reason a human knows that is that it's implied by the names.)

    So, is the lack of static typing a problem here?

    Well, we can do the exact same isinstance check inside the unapply decorator; just use @unapply(FreeUser) instead of just @unapply. This seems perfectly reasonable. And it's no more verbose than what you have to do in Scala.

    Or we could even have @unapply check for annotations and test them. It's even closer to what you do in Scala (and in most other FP/hybrid languages), and, if we're going to encourage type annotations in general, why not here? On the downside, PEP 484 specifically says that its annotations don't affect anything at runtime. Also, not every PEP 484/mypy static annotation will work dynamically (e.g., Iterable[int] can only check for Iterable). But I don't think that's a problem. The decorator version seems more conservative, but either one seems reasonably Pythonic if we wanted to go with it.

    Can the two proposals be merged?


    It's pretty nice that the earlier proposal uses the type name as the pattern name, especially for simple types like namedtuple.

    But it's also pretty nice that the Scala-influenced proposal allows you to have multiple pattern extractors for the same type.

    Scala gets the best of both worlds by letting you give one of the extractors the same name as the type (and by giving you one automatically for case classes).

    We could get the best of both worlds by just implementing both. The relevant bit of case-statement logic from my previous proposal did this to match of Breakfast(0, _, eggs)::

    if isinstance(case_value, Breakfast):
        try:
            _0, _, eggs = case_value.__match__()
            if _0 != 0: raise MatchError(_0, 0)
        except AttributeError:
            pass # fall through to next case
        else:
            case body here
    

    To extend this to handle extractors:

    if isinstance(case_value, Breakfast):
        # ... code above
    elif isinstance(Breakfast, unapply_function):
        try:
            _0, _, eggs = Breakfast(case_value)
        # ... rest of the code is the same as above
    

    We need some way to avoid calling Breakfast when it's a class (again, consider the FileIO problem from my previous post). That's what the second isinstance check is for. That unapply_function type is what the @unapply decorator (and any related decorators) returns. (It's a shame that it can't inherit from types.FunctionType, because, as of Python 3.6, that's still not acceptable as a base type. But it can implement __call__ and otherwise duck-type as a function. Or, alternatively, just use a function and attach a special attribute to it in the decorator.)

    In fact, we can even allow externally defining decomposition without monkeypatching just by having a special decorator that registers an extractor as the type-named extractor for a type, maybe @unapply_default(Spam). That could create Spam.__match__ as a wrapper around the function, but it could just as easily store it in a registry that we look up if isinstance(case_value, Spam) but Spam.__match__ doesn't exist. (The latter allows us to avoid monkeypatching existing classes--which is obviously important for classes like int that can't be patched... but then do you want coders to be able to create a new pattern named int? An extractor that can be applied to int, sure, but one actually called int?)

    Other features of Scala pattern matching


    Features already covered


    There are some Scala features I didn't mention here because they're equivalent to features I already covered in my previous post:
    • Scala has as clauses with the same semantics. (they're spelled NEWNAME @ PATTERN instead of PATTERN as NEWNAME, which I think is less readable).
    • Scala has if guard clauses with the same semantics (including being able to use values bound by the decomposition and the as clause), and almost the same syntax.
    • Scala has value definition and variable definition pattern syntax, with the same semantics and nearly the same syntax as my pattern-matching assignment (not surprising, since their syntax was inspired by Python's sequence unpacking assignment, and mine is an extension of the same).

    Pattern matching expressions


    Scala has a match expression, rather than a statement.

    That shouldn't be too surprising. As I mentioned last time, most languages that make heavy use of pattern matching use an expression. But that's because they're expression-based languages, rather than statement/expression-divide languages. (Traditionally, functional languages are expression-based either because they encourage pure-functional style, or because they're heavily based on Lisp. Scala isn't being traditional, but it's expression-based following the same trend as JavaScript and Ruby.)

    In Python (unlike traditional functional languages, Scala, or Ruby), if is a statement--but then Python also has a more limited form of if that can be used in expressions. Could the same idea apply here? Maybe. But consider that Python had an if statement for many years before the stripped-down expression was added, and the statement is still used far more often.

    I think we'll want pattern-matching statements for anything non-trivial, and we won't know what (if anything) we'd want from a stripped-down limited pattern-matching expression until we get some experience with pattern-matching statements.

    Predicate extractors


    Scala allows extractors that return a boolean instead of an optional tuple. This allows you to match without deconstructing anything:

    object PremiumCandidate {
      def unapply(user: FreeUser): Boolean = user.upgradeProbability > 0.75
    }
    
    user match {
      case PremiumCandidate() => initiateSpamProgram(user)
      case _ => sendRegularNewsletter(user)
    }
    

    This is obviously easy to add to Python:

    @unapply
    def premium_candidate(user: FreeUser):
        return user.upgrade_probability > 0.75
    
    case user:
        of premium_candidate:
            initiate_spam_program(user)
        else:
            send_regular_newsletter(user)
    

    In Scala, the different logic is driven off the static return type, but we can just as easily use the syntactic difference: If we get an unapply_function rather than a (syntactic) call of an unapply_function, we call it and if it returns something truthy it's a match; if it returns something falsey or raises AttributeError it doesn't.

    Scala also lets you use predicate extractors together with its equivalent of the as clause as a way of type-casting the value if it passes a type-check. For example, if initiateSpamProgram requires a FreeUser, rather than just a User, you could write:

    user match {
      case freeUser @ PremiumCandidate() => initiateSpamProgram(freeUser)
    }
    

    We can't do that in Python, but then we don't have any need to. At runtime, initiate_spam_program will probably just duck-type; if not, it'll isinstance or similar, and it will see that user is indeed a FreeUser. For mypy or other static type checkers, there's no reason the checker can't infer that if user passes premium_candidate(user: FreeUser), it's a FreeUser; not being able to do that is just a limitation of Scala's typer (user can't be defined as a User but then be inferred as a FreeUser within the same expression). We don't need to copy that limitation just so we can try to find a workaround for it.

    Extractors with apply


    You'll notice that I mentioned apply way back at the top, but I only implemented unapply in the above examples. So, let's show an example of apply:

    object Twice {
      def apply(x: Int): Int = x * 2
      def unapply(x: Int): Option[Int] = if (x%2 == 0) Some(x/2) else None
    }
    
    val i = Twice(21)
    

    Or, maybe a better example:
    object Email {
      def apply(user: String, domain: String): String = user + "@" + domain
      def unapply(email: String) =
        val parts = email split "@"
        if (parts.length == 2) Some((parts(0), parts(1)) else None
    }
    
    email match {
      case Email(user, domain) => if (domain == our_domain) message(user) else forward(email, domain)
      _ => log_unknown_email(email)
    }
    
    val email = Email("joe.schmoe", "example.com")
    

    As you can see, the apply method gives us a constructor whose syntax perfectly matches the case pattern used for decomposition. (Just as we get in my original proposal when your __init__ and __match__ match.) That's nice, but not a huge deal in many cases.

    Anyway, this one is easy to do in Python:

    @apply
    def twice(x):
        return x * 2
    
    case twice(21):
        of twice(n): print(n)
        else: print(-1)
    

    But I don't think this is necessary. It doesn't seem to be used much in Scala code, except in the special case of an extractor whose name matches the class--which we'd handle by having __match__, with the already-existing __init__ for a constructor. (Of course there's nothing forcing you to make your __match__ and __init__ parallel. But then there's nothing forcing apply and unapply to be parallel in Scala either. Usually, you'll want them to go together; in the rare cases where you want them to be different, there's nothing stopping you.)

    Plus, we already have ways to write external factory functions for types when you really want to; there's no reason to add and standardize on a new way.

    But, if we do want apply (and, again, I don't think we do), is sometimes having two functions instead of one a good argument for putting apply and unapply together in a class? I don't think so. Compare:

    @apply
    def twice(x):
        return x * 2
    @unapply
    def twice(x):
        if x%2 == 0: return (x/2,)
    
    @extractor
    class twice:
        def apply(x):
            return x * 2
        def unapply(x):
            if x%2 == 0: return (x/2,)
    

    Sure, the latter does group the two together in an indented block, but I think it looks like heavier boilerplate, not lighter. And in the very common case where you don't have apply that will be much more true. Also, notice that @extractor has to be a bit magical, auto-singleton-ing the class and/or converting the methods into @staticmethods, which makes it harder to understand. And, if anything, it's less clear what twice(21) or of twice(n): are going to do this way; sure, once you internalize the extractor protocol you'll be able to figure it out, but that's not trivial. (Plenty of Scala developers don't seem to get how it works in their language; they just treat it as magic.)

    Sequence extractors


    Scala lets you write sequence extractors, with another special method, unapplySeq. This returns an optional sequence instead of an optional tuple (or an optional tuple whose last element is a sequence), and there's special syntax that allows you to match the decomposed sequence with a * wildcard, like this:

    xs match {
      case List(a, b, _*) => a * b
    }
    

    It's pretty obvious how to do this in Python, and a lot more simply: this is just the normal use of * for iterable unpacking assignment, *args parameters, and splatting arguments. I already covered how this works with __match__ in my earlier post; it's the same for extractors.

    Infix extractors


    Scala also allows you to use infix extractors, to match infix patterns. For example, you could match 2 * n directly by using an extractor named * with an appropriate unapply.

    More typically, you match head :: tail to deconstruct cons lists, using an extractor that looks like this:

    object :: {
      def unapply[A](xs: List[A]): Option[(A, List[A]) {
        if (xs.isEmpty) None else Some((xs.head, xs.tail))
    }
    

    If Python doesn't need arbitrary infix operators, or the ability to use existing operators as prefix functions or vice-versa, or other infix-operator-related features like sectioning, it doesn't need infix patterns either.

    Especially since the primary use for this, deconstructing cons lists, is something you're rarely going to need in Python. (If you want cons lists, you have to write them yourself, and you won't get any syntactic sugar for constructing them, so why would you be surprised that you don't get syntactic sugar for deconstructing them? You can use the same Cons(head, tail) for a pattern as you use for a constructor.)

    Patterns in other places


    My proposal suggested explicit case statements, and generalizing sequence unpacking in assignments to pattern decomposition, but I didn't think about all the other places Python binds variables besides assignments. Scala's designers did, and they made it work in as many places as possible.

    So, looking at all the places Python does bindings:

    • Iteration variables in for statements and comprehensions: We can already sequence-unpack into the iteration variables; why not pattern-match? So, for Point3D(x, y, z) in vertices:.
    • as-clause variables in various statements. Python already allows sequence unpacking here, although it requires extra parentheses:
      • with: I don't think Scala has an equivalent, but I could definitely imagine wanting it here. Although often you're going to want to bind the whole thing as well as decomposing it, and doubling the as (with spam.open() as TextFile(filename) as f:) seems pretty horrible.
      • except: Scala's catch basically has an implicit ex match, which you write normal case clauses inside. This could actually be more useful in Python than in Scala, where exceptions are full of good stuff, and you could, e.g., pull the filename out of a FileNotFoundError. But it would be clumsy, because you've already matched the type: except FileNotFoundError as FileNotFoundError(filename):. So... maybe in a new language, but probably not useful in Python.
      • import: It seems like it would always be more readable to import and then case. Maybe someone can come up with a counterexample, but I doubt it.
    • For completeness, if we're looking at import ... as, then we should also look at from ... import, and maybe even plain import. And I think it looks just as bad in the former statement, and a thousand times even worse in the latter.
    • Function parameters in def and/or lambda. In my previous post, I suggested ML-style pattern overloads for def, and assumed that the fact that this wouldn't work for lambda wouldn't matter because why would you want it there? But in Scala, single-case lambdas can be written without all the match syntax, and this is used pretty extensively in idiomatic Scala. (Of course in Scala, pattern matching is an expression, not a statement, and idiomatic Scala uses lambdas in many places where idiomatic Python uses named functions.) There's a bit of ambiguity in the fact that a parameter list is actually a list of target-lists that have to be single targets or parenthesized--and remember that Python 3.0 removed that ambiguity by killing even basic iterable unpacking for parameters. Maybe someone could come up with a syntax that allows def line_to(Point3D(x, y, z)): or similar without that ambiguity, but even if you could, does that really look useful in a language without static typing and overloads? (Again, I think if we want this, we want a way to define overloads by pattern.)
    • Conditions are not currently a place to do bindings in Python, and that's a good thing. But occasionally, something like C's if ((c = getch()) != EOF) is handy. Could a stripped-down single-case pattern-matching binding in conditions be a way to get most of the benefits, without the problems? (Similar things certainly work in a variety of languages from Swift to Haskell, but in those languages, an if body has its own scope, which will either shadow an outer-scope name or raise an error complaining about shadowing, so I don't know if that means anything.) Anyway, I haven't thought much about this.

    Partial functions


    Scala makes extensive use of partial functions. Not partially-applied functions (like functools.partial), but partial functions in the mathematical sense: functions that are only defined on part of the implied domain.

    This is actually a weird hybrid of static and dynamic functionality, but it turns out to be useful. At compile time, Scala can tell that your cases don't exhaust all of the possibilities. In many functional languages, this would mean that you get a compiler error. In Scala, it means you get a partial function instead of a normal function. If you call it on an out-of-domain value, you get a MatchError at runtime. But you can also LBYL it by calling its isDefinedAt function. And code that always checks before calling can be statically determined to be safe.

    And, idiomatically, partial functions--usually constructed with the special anonymous-function-with-implicit-match syntax mentioned above--are what you often pass to things like collect. The collect function does the equivalent of map and filter at the same time, building a new sequence with the results of the partial function applied to the members of a source sequence, while skipping the ones that aren't defined. Like this, in Python terms:

    def collect(func, iterable):
        for value in iterable:
            try:
                yield func(value)
            except MatchError:
                pass
    

    Of course that demonstrates that a dynamic language that loves exceptions and EAFP really doesn't need an explicit concept of partial functions. A partial function is just a function that can raise (or that can raise a specific exception), and using partial functions is something we trivially do all the time without thinking about it. So, this is a neat way to get familiar Pythonic functionality into a very non-Python-like language, but Python has nothing to learn there. (Although maybe mypy does?)

    Other languages


    ML


    First, traditional ML patterns have a feature that we haven't discussed: "either patterns", where two (or more) different patterns (with "|" or "or" between them) are treated as a single case. In Python terms:

    case meal:
        of Breakfast(spam, *_) or Lunch(spam, *_):
            print('Are {} servings of spam really enough?'.format(spam))
    

    In ML, the composed patterns have to bind the same set of variables, with the same types (as they are in the example above). There have been attempts to expand that to wrap variables bound only on one branch in optional types, and even to allow variables with different types on each branch to be bound as choice types, but that turns out to be painful to use (you end up with a mess of pattern-matching expressions inside pattern-matching expressions that would be a lot more readable if you just flattened them out).

    In object-oriented ML variants, it makes sense to allow the different branches to match different class types, as long as they share a common supertype, in which case the bound variable is of that supertype. (I don't think either base OCaml or F# actually does that, but it makes sense...)

    But in a duck-typed language like Python, of course, you could dispense with all requirements: variables matched to different types in different branches are fine, as long as all the types duck-type successfully for whatever operations you use with them, and even variables only matched in one branch and not the other are fine, as long as you can handle the possible UnboundLocalError.

    Some ML offshoots have experimented with the dual idea of "both patterns", where two (or more) different patterns (with "&" or "and" between them) are treated as a single case. Here, obviously, they have to either not bind the same variables, or bind them to the same values. (The Python rule would presumably be that they can bind the same variables, but if they bind them to different values, one of them wins arbitrarily. Or maybe it would be useful to say the last one always wins, or the first truthy one, or some other rule? Without experimenting with it, it's hard to guess whether any of them would be particularly helpful...)

    With only literally deconstructing patterns, both patterns aren't very useful (what's the point of matching Breakfast(spam, _, _) and Breakfast(_, 0, eggs) instead of just Breakfast(spam, 0, eggs)?), but with unapply functions, they make a lot more sense. For example, if you had any two of the coordinate systems for 3D points, you could match the others--e.g., cartesian(_, _, z) and spherical(r, theta, _) gives you the cylindrical coordinates.

    I'm not sure how useful either of these features would be in Python, but they're not too hard to define. If someone wants to write a serious proposal to add pattern matching to Python, they should at least provide a rationale for not including them.

    F#


    F# builds on OCaml with the specific intent of using .NET libraries, and in particular using the C#-centric, OO-heavy .NET framework for most of its stdlib. So, they obviously needed a way to decompose arbitrary classes. Since Scala, Swift, Rust, etc. hadn't been invented yet, they had to figure it out for themselves. They took it as a specific case of the more general idea of matching abstract data structures, and wrote a paper on their design, Extensible Pattern Matching Via a Lightweight Language Extension.

    In simplest form, F#'s "active patterns" are pretty similar to Scala's extractors, and even more similar to the variation I suggested above: instead of defining a singleton object with unapply and apply methods, you just define plain (unapply) function with a special marker (F# uses "banana brackets" around the name, like (|Complex|), rather than a function decorator, which becomes important later) to declare it as an active pattern; you can then use it in any (single, partial-multiple, or total-multiple) match expression the same way you'd use a record type.

    But you can go farther. For example, instead of just returning a tuple, you can return what looks like a record constructor with the function name as the record type name and the tuple as its arguments. And you can then define a "banana split" pattern, with multiple names inside the banana brackets, which can return what looks like a record constructor for any of those names, which can then be pattern-matched as if they were the multiple constructors of an actual sum type. (In fact, they are constructors of an actual sum type, it's just anonymous). The example in the docs deconstructs an abstract list with (|Cons|Nil|) by calling its non-empty, head, and tail methods. In Python terms, it might look something like this:

    @unapply_split('Cons', 'Nil')
    def cons_nil(lst: LinkedList): # LinkedList is an ABC
        if lst: return Cons(lst.head(), lst.tail())
        else: return Nil()
    

    And you can then use Cons and Nil in patterns to match list instances. The (inferred) type of that function in F# is something horrible that you don't want to ever read involving the parametric choice type, so the fact that it would probably not be represented in the type system at all in Python isn't terrible. Anyway, while that's a nice feature, I can't imagine how to make it fit in nicely with Python syntax, and defining two separate partial unapply functions is not that much heavier, and probably a lot easier to understand:

    @unapply
    def Cons(lst: LinkedList):
        if lst: return lst.head(), lst.tail()
    
    @unapply
    def Nil(lst: LinkedList):
        if not lst: return ()
    

    However, it's a lot easier to prove that a pattern matching both Cons and Nil is a total pattern (or detect whether it's a partial pattern, for use in building automatic partial functions with is_defined support) on LinkedList with the "banana split". That's important for F# (or Scala), but probably not so much for Python. (Of course this particular example is pretty stupid for Python in the first place--just use the iterable protocol. But it obviously generalize to less trivial examples that aren't as silly in Python.)

    F# also allows for matching with independent active patterns, with the usual rule that the patterns are just dynamically tried in order and there must be a default case, which allows you to use partial matching functions and still prove total matching. Anyway, that feature is exactly equivalent to the separate unapply functions just described, except that in F# you have to explicitly define each active pattern as partial (by banana-splitting it with _), and of course it has to be represented somehow in the type system (by wrapping the single type or choice of types in an option that's normally handled for you by the syntactic sugar of the multiple-match expression), while in Python, you'd take it for granted that patterns without default can't be proven total and that the values are duck typed.

    Finally, you can extend this to wrap almost any function as a pattern match. For example, you can write a parse_re active pattern that takes a regex from the pattern, and additional arguments from the match expression, and returns its groups if successful, This is really only tricky with static typing, because inferring the type of that parse_re function is a nightmare. Also, in a dynamic language like Python, it's pretty much unnecessary: you'd either add __match__ to the SRE_Match type, or write an unapply function for it, and then just use re.compile(regex).match(x, y) as your pattern, right?

    Beyond all the ways active patterns expand on ML-style patterns, the fact that they're ultimately just functions means you can use them with higher-order functions. Of course the Scala extractor objects or the proposed unapply functions for Python have the same benefit.

    Finally, F#, unlike most ML offshoots, has a notion of mutable records. With a bit of clumsy syntax, you can actually bind to a mutable attribute in the match and then mutate that attribute in the resulting code. You'd think that, if this were useful, there would be a way to do the same with abstract types, but (short of building a bunch of proxy objects) there isn't. So, presumably that implies that we don't need binding to "reference cells" or anything like that in Python.

    C#


    More recently, the designers of C# looked at F# and Scala pattern matching in practice, and designed pattern matching for C#.

    It's worth noting that they decided that, even though their pattern matching has to handle arbitrary OO classes, it's still worth adding record types or more general ADTs if they're going to add pattern matching. That seems like more evidence for adding automatic matching for the record-like types that already exist in Python (like namedtuples) and/or adding actual ADTs.

    But the most interesting part of C#'s design is that they start out with a single-clause pattern-matching expression, EXPR is PATTERN, and build everything on that. The expression is true if the pattern matches, and any bindings in the pattern are bound in the enclosing scope. (Actually, their rule for deciding the appropriate scope is pretty complicated, and weird in some cases, but the Python equivalent would be easy--the enclosing function or class body is always the appropriate scope.) That's general enough to let you do things like if (expr is List x) do_list_stuff(x); or if (x?.y?.z is Int i) do_int_stuff(i); without needing special syntax like Swift's if let. And then a multi-pattern switch statement or a a multi-pattern match expression--or, if you're the C# team, why not both?--is pretty simple syntactic sugar for that (but syntactic sugar that the optimizer can recognize and maybe take advantage of). Destructuring assignment seems to be the only single-pattern use that couldn't be directly built on is, so they added special syntactic sugar for that.

    So, how do they handle deconstructing arbitrary classes? The default behavior only works for destructuring record/ADT types (and for simply matching values to types without any destructuring), but that is operator is overloadable, like any other operator in C#. At first glance, that sounds similar to the __match__ protocol (for normal-method operator overloads) or to the extractor protocol (for static-method overloads), but it's actually more complicated: is is treated like a conversion operator, so you can specify how any class destructures to or from any other class. The example in the docs is a Polar point class that deconstructs a separate Cartesian point class, so you can write var p = Cartesian(3, 4) and then later match it with is Polar(var r, var theta). So, Scala's extractors are really just a special case of C#'s overloaded is. The question is, are there any other cases that are useful? The only examples I see all use a static class with a static is overload and no other methods at all, which means they're just unapply functions with extra boilerplate...

    Mathematica


    I may have mentioned this in my previous post, but... In Mathematica, everything is a primitive value or a tree. And this means pattern-matching in Mathematica is designed around trees, rather than flat structures. So, you can match something like a[b[_], _], which is like first matching a[X, _] and then matching X against b[_]. You can also match against just the second level, or against just a leaf at whatever level it appears, etc. That's all pretty cool, but I don't think it's needed in any language where everything isn't a tree.

    Swift, Rust, etc.


    I've already mentioned Swift multiple times. The other obvious modern language to look at is Rust. I don't think there are any radical new ideas from either, but if I'm wrong, I'll come back and edit this section.

    Dynamic languages


    Pattern matching comes out of statically-typed functional programming. Scala, C#, Swift, Rust, etc. are all primarily attempts to bring ideas from static FP to static OOP (in ways that will hopefully be familiar to traditional OO programmers), and secondarily attempts to find ways to simulate some of the benefits of duck typing on top of static typing, so it's not too surprising that they all came up with similar ways to import the idea of pattern matching.

    But, besides asking whether the lack of static typing could be a problem for pattern matching in Python (e.g., the fact that we can't detect partial patterns at compile time), we should also ask whether duck typing could offer more flexibility or other benefits for pattern matching. I already mentioned how either and both patterns could be more useful in Python than in OCaml. And the fact that Python doesn't have to infer horrible choice types that no human will ever be able to debug if the inference goes wrong might be an advantage.

    But if there are any good examples in other dynamic languages, it would definitely be worth looking at them.

    Of course many dynamic languages have "pattern matching" in Perl terms: regex matching on strings, or some generalization of string-matching ideas to other types. I know that generalization is supposed to be one of the key ideas in Perl 6, but, having not dug into Perl 6 beyond reading the apocalypses, I don't know if it meets up with any interesting common ground with ML-style pattern matching, or if it's just SNOBOL string patterns with a more powerful grammar.

    Also, of course, people have implemented pattern matching in Perl, Ruby, etc., just as they have in Python, and those languages with more flexible syntax allow them to make it look more like ML--but that just makes it look less like the native language, doesn't help at all in designing a way to integrate it as a real native language feature, and, most importantly, generally does nothing to solve the problem of decomposing objects, which is the key problem.

    There's gotta be something on LtU...


    I've only done a cursory search at Lambda the Ultimate to see if anyone's published any papers on integrating pattern matching into dynamic OO languages. I found and skimmed some posts about the problem of decomposing objects, but all in terms of static OO (that's where I found the F# paper, which led me to the C# work in progress...). Some of those might have interesting ideas that aren't incorporated in any of the languages I've looked at, but I don't yet know. And it seems like there must be something on dynamic object decomposition out there, even if I didn't find it in a couple minutes.

    How useful is pattern matching in idiomatic Scala?


    Scala, like many modern functional languages (and, increasingly, even some primarily-imperative languages), builds its failure handling around an Option type, basically equivalent to Haskell's Maybe but with some extra stuff piled on top (like being able to construct an Option from a nullable-typed Java object).

    In most such functional languages, pattern-matching is the idiomatic way to deal with optional values. But in Scala, many guides (like part 5 of Westheide's) point out that it's pretty verbose in Scala, and it's often better to use the extras piled on top of the type instead. Compare:

    println("Name: " + user.name match {
      case Some(name) => name
      case None => ""
    })
    
    println("Name: " + user.name.getOrElse("<unknown>"))
    

    Honestly, I'm not qualified to say whether Westheide is right here (although other guides agree with him--and the fact that Scala has the getOrElse method is telling), or whether this extends to other paradigm examples of pattern matching or is specific to the fact that Option, and what you usually do with its values, is so simple. But still, it's worrying: if idiomatic Scala usually doesn't use pattern matching even for one of the paradigm examples of pattern matching, how often would idiomatic Python use it?

    Comparing to Swift


    Swift made extensive use of Optional from the first beta, but the designers originally didn't feel the need to add general pattern matching. Instead, the general solution is to test for nil, and then unwrap inside the tested code. There's special support for that all over the place:

    • There's a single-character type operator for optional types: int? is the same as Optional<int>.
    • Unwrapping is a single-character postfix operator: var!.
    • The language's truthiness notion is based on non-nil (except for actual boolean values), so you can write if var { println(var!) }. (Yes, in theory, this means optional booleans are a bit harder to deal with, but that rarely if ever comes up--practicality beats purity and all that.)
    • if let lets you combine the test and binding into one: if let v: int = var { println(v) } binds the value from the optional int var to the non-optional int v, so you can use it in the "then" scope.
    • guard is like if, but lets you bind a variable in the enclosing scope (by enforcing that the failing case must exit that scope), so you can write guard let v: int = var else { return } and then use v in the rest of the function.
    • The nil-coalescing operator ?? serves most uses of an "unwrap-or-else-use-default" method: 2 + (v ?? 0).
    • The typing engine, including inference as well as checking, understands all of this, so you can just write if let v = var without specifying a type, the compiler can skip any warning or runtime type check inside the body of an if or after a guard, etc.

    However, in later versions of Swift, they felt the need to add additional sugar to pattern matching for optional values anyway. In general, you match an Enum (ADT) constructor (in switch, in a single-pattern if, and a few other places) with case .Kind(val1, val2) = var, and any bindings you want have to be explicitly signaled, with case .Kind(val1, let var2). But when you're matching an optional value, instead of if case .Some(let var1) = var, you can just write case let var1? = var2. So, maybe this implies that pattern matching is important, even when you've done everything you can to special-case away the need for it. Usually, idiomatic Swift code gets away with avoiding any pattern matching for optionals by using if let and ??, but the fact that the designers felt the need to add case let var? implies that they couldn't get rid of every need for it.

    Naming


    From a quick survey of other languages:

    ML uses case ... of to introduce the multi-case pattern-matching construct, and | to separate cases; a function definition can also be defined in terms of cases on its parameters, without needing the case bit. Some ML descendants change the | to a different symbol, or change it to the keyword of (and remove that from the outer clause). But newer descendants mostly strip it down even further. Erlang makes every binding a single-case pattern match, so you don't need any syntax. Haskell embeds the multi-case match into various bits of syntax, like binding, and replaces the | with whitespace, so there's basically no syntax anywhere.

    Meanwhile, to people coming from the imperative world, C's switch and case are familiar. Whether that's a good thing, or an attractive nuisance, seems to be up for debate. Swift deliberately uses both of the same words. Many other languages (Scala, Rust, draft ES7) use match instead of switch, but some of them still use case for the individual cases. Others avoid both C keywords.

    One advantage out of using the word case this way is that, unlike of or a symbol or whitespace, it reads well in single-case stripped-down constructs, like Scala's single-case simple anon functions ({ case(a, b) => a+b }) or Swift's if case and similar statements (if case .Point(let x, let y) = point where x == y { print("diagonal at \{x}") }).

    Anyway, my use of case and of probably isn't actually familiar to anyone who isn't currently in a late-80s university class, and it collides with C just enough to be misleading but not enough to be comforting. So, I think either match/case or switch/case would fit Python better. (And, if C# plans to set the precedent that switch means statement and match means expression--coinciding with Swift's switch statement and Scala's match expression--maybe it's not worth fighting that.)

    Use cases


    The main thing stopping anyone (or at least me) from writing a serious proposal for adding pattern matching to Python is the lack of good use cases. Everyone who uses any language built around pattern matching knows that pattern matching is useful... but coming up with actual examples that make sense in Python is harder than it should be. And maybe that means something.

    Part of the problem is that most of the examples that aren't just toys tend to be things that can be better done in other ways in Python:

    • Option, Error, etc. types are an alternative to exceptions, and Python uses exceptions pervasively instead. (Plus, as Scala implies, pattern matching may not even be the best way to handle these most of the time.)
    • Decomposing records in general tends to fight against OO encapsulation. While abstract matching solves that problem, most examples, even in languages like Scala and F# that have abstract matching, seem to be borrowed from languages that don't (and written around case classes, records, or whatever they call them). While such cases might arise in Python, the fact that we don't generally use namedtuples all over the place (and they're buried in collections rather than builtins) and we don't have more general ADT-like features implies otherwise.
    • Simple switching functions like factorial (whether with a match expression, or as pattern-matched overloads) only really look simpler with pattern matching when you define them recursively. But in Python, you generally wouldn't define them recursively.

    • Recursively decomposing (cons) lists (and generalizing that to, e.g., abstract list-like objects, as in the F# example above) has the same problem, and the further problem that you rarely use cons lists in Python, and the further problem that, even if you did, you'd almost certainly want to use the iterator protocol on them.

    • Recursively decomposing more complex recursive structures that don't fit the iterator protocol seems more promising, but again, you'd usually want to process them iteratively rather than recursively if they're large enough to be a problem. Plus, you don't often run into deeply recursive data structures anyway--or, if you do, they tend to be implicit in JSON or similar, and pattern matching is more verbose there, not less; the benefit of using patterns there is static type checking, which you aren't going to get in Python.

    • Many examples would be more idiomatically written as method dispatch in an OO language. Even the more complex related problems (like operator double dispatch) already have better idiomatic solutions in Python.

    • The equivalent of a C switch statement is not very exciting, even when extended to matching enum constants or strings instead of just integers, and that's already readable enough as an elif chain. Or, in many cases, you'd just use a dict.

    There must be examples in my code or other code in other languages that would still look like idiomatic Python-plus-patterns when translated... but I haven't found them yet.

    Conclusions (or lack thereof)


    A year and a half ago, I concluded that, even though I think I came up with a reasonable and implementable proposal for pattern matching in Python, I don't think it's a feature Python needs, or should get, at least at present.

    The fact that Scala's designers came up with a similar solution to what I proposed, and it's very heavily used in idiomatic Scala, even by people who are mostly coming to the language from Java rather than ML or Haskell or something, is pretty heartening.

    There are definitely ways that Scala's feature is more flexible than the one I proposed--and, for the most part, we can pretty easily add that flexibility to my proposal, making it even better.

    On the other hand, the fact that Scala seems to encourage alternatives to pattern matching in some of the places I'd use it in Haskell or ML makes me think it might turn out to be underused in Python as well. Or maybe not--after all, maybe you sometimes have to fall back to pattern matching (or get much more verbose) even though the alternatives are there, as is presumably the case in Swift?

    So... now I'm less against the idea of adding pattern matching to Python. But I'm still not sufficiently excited about it to build an implementation, write a PEP, try and convince the core developers, etc. And if switching between Haskell and Python didn't make me ache for pattern matching in Python, I doubt that switching between Scala and Python will either. But maybe I'm wrong; time will tell.
    0

    Add a comment

Blog Archive
About Me
About Me
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.