1. The documentation does a great job explaining how things normally get looked up, and how you can hook them.

    But to understand how the hooking works, you need to go under the covers to see how that normal lookup actually happens.

    When I say "Python" below, I'm mostly talking about CPython 3.5. Other implementations vary to differing degrees in how they implement things; they just have to result in the same rules being followed.

    Variables: LEGB


    Python has simple lexical scoping, like most other languages. (Unlike many languages, only module, function, and class definitions create new scopes, not every suite, but that's not too important here. Also, remember that lambda expressions and comprehensions are function definitions, and thus create new scopes, despite not having suites.)

    The specific rule Python uses for lookup is LEGB: Local, Enclosing, Global, Builtin. In other words, when you write x, Python looks for a local variable named x then (if you were in a nested function or class) it goes through all the enclosing scopes' local variables, then it looks at the module's globals, and then it looks at builtins.

    Meanwhile, because Python has implicit declarations (any assignment may be declaring a new variable), it needs a rule for that: any assignment creates a local variable (shadowing any enclosing, global, or builtin of the same name), unless you have a nonlocal or global declaration in the same scope.

    The way Python implements these rules is not as simple as you'd think. Primarily, this is for reasons of optimization.

    tl;dr


    The short version is:

    * Locals (except when shared with a closure) are loaded and stored in an array on the stack frame.
    * Nonlocals (and locals shared with a closure) are basically the same but with an extra dereference.
    * Globals are loaded and stored in a dict.
    * Builtins are loaded via two or three dict lookups (and are never directly stored).

    Fast locals


    Local variables are stored in an array on the stack frame, and loads and stores are compiled into array-indexing operations on that array. This is basically the same behavior as C--and it's a lot faster than going through a dictionary.

    Exceptions to this rule:

    * Modules don't have fast locals; they use the globals dict as their locals, so everything works the same as for the globals section below.
    * Local variables that are shared with a nested closure are slightly more complicated; see the next section.

    So, how does Python know how to compile your locals to array indices? And how does it turn that back into the local variable names that you can see in tracebacks, locals(), inspect and dis functions, etc.?

    When compiling the body of your function, Python keeps track of all of your variable assignments. (Assignments to attributions or subscriptions don't count; those are just calls on the attributed or subscripted object.) Function parameters are local, any variable that you assign to is local (unless you have a global or nonlocal statement for it); anything else is not.

    So, it creates a tuple of all those local names, and stores it in the co_varnames of the code object. It also stores the total count in co_nlocals. Then, when compiling your code to bytecode, every load or save to co_varnames[3] becomes a LOAD_FAST 3 or STORE_FAST 3.

    When a function is called, the stack frame reserves co_nlocals extra space at the end, called f_localsplus, and when the interpreter sees a LOAD_FAST 3, it just reads from frame.f_localplus[3].

    For example, in this code:

    def spam(eggs):
        cheese = 0
        return eggs + cheese
    

    ... eggs is local because it's a parameter, and cheese is local because you assign to it, so that last line will get compiled to, in effect, LOAD_FAST(0) + LOAD_FAST(1), not LOAD_NAME('eggs') + LOAD_NAME('cheese').

    Now you can probably understand why these optimizations work:

    def spam(eggs=eggs):
        local_cheese=cheese
        for _ in range(1000000000):
            eggs(local_cheese()))
    

    You've replaced a billion dictionary lookups for the globals eggs and cheese with one dictionary lookup for each plus a billion array lookups for f_localsplus[0] and f_localsplus[1], which is obviously faster. (Dictionary lookups are, of course, constant-time, just like array lookups. But with a larger constant multiplier--large enough to make a significant difference here.)

    But how does Python get the names back out? Well, as mentioned, they're stored in co_varnames. And when a function is called, the stack frame gets a pointer to the function's code object in f_code So, if it needs to build a traceback or disassembly or whatever for fast #1, it just gives you frame.f_code.co_varnames[1].

    What about locals()? Well, there actually is a locals dict on the stack frame, called f_locals. But it doesn't get created unless you ask for it (e.g., by calling locals(), or just by asking for the Python version of the frame object with, e.g., sys._getframe()). This calls a function PyFrame_FastToLocals, which effectively does frame.f_locals = dict(zip(frame.f_code.co_varnames, frame.f_localsplus)), and then it returns that f_locals to you. It should be obvious why, as the docs say, modifying locals() doesn't affect the function's actual locals: it's just a snapshot of your locals. (There is actually a function PyFrame_LocalsToFast, but you can only call that from C, not Python.)

    What about exec()? Again, as the docs say, this function by default works on (something equivalent to) it's caller's globals() and locals(), so, again, it can't change your local variables.

    I used Python-like pseudocode for all the stuff that happens in C under the covers. But most of that actually is exposed to Python. For example, if you call sys._getframe(), the frame object you get back has an f_code member. Or, if you just look at a function, it has a __code__ member. And either way, that object has a co_varnames. And so on. The f_localsplus member, and the PyFrame_FastToLocals and PyFrame_LocalsToFast are the only things mentioned here that aren't exposed. And even there, you can always do this:

    ctypes.pythonapi.PyFrame_LocalsToFast.argtypes = [ctypes.py_object, ctypes.c_int]
    ctypes.pythonapi.PyFrame_LocalsToFast.restype = None
    ctypes.pythonapi.PyFrame_LocalsToFast(sys._getframe(), 0)
    

    Have fun with that.

    Free and cell variables


    Variables from an enclosing scope are almost the same as locals--they're handled by looking them up in an array--but with an extra deference. But that requires a bit of extra work to set up.

    So, how does Python know how to compile your nonlocals to array indices?

    As we saw above, when Python sees a variable, it knows whether that variable is local to your function. If it isn't, Python steps through the outer enclosing functions to see if it's local to any of them. If so, it's a free variable. (Of course a global statement means it's a global variable, so it doesn't look through enclosing scopes, and a nonlocal means it's a free variable even if there's a local assignment.)

    Notice that the module's global scope is not counted as a scope for these purposes--anything found in the global scope is always global, not enclosing.

    So anyway, the compiler creates a tuple of all those free names, and stores it in co_freevars. Then, when compiling your code to bytecode, every load or save to co_freevars[2] becomes a LOAD_DEREF 2 or STORE_DEREF 2.

    Now, when a function is called, the stack frame doesn't just reserve co_nlocals space, but co_nlocals + len(co_freevars) extra space at the end (I'm still lying to you here; see a few paragraphs down for the truth) in f_localsplus, and when the interpreter sees a LOAD_DEREF 2, it just reads from frame.f_localsplus[frame.f_code.co_nlocals + 2].

    Well, not quite. That obviously wouldn't allow you to reassign closure variables. (You could call mutating methods, but assignment would just break the connection, just like after a = b = [1, 2, 3], a.append(4) affects b, but a = [4] breaks the connection.) And that would defeat the entire purpose of nonlocal.

    So, what you actually have in that slot isn't the object itself, but a cell object that has a member named cell_contents that's the actual object. And what the outer scope has in its fast locals slot is the same cell object, not the thing it points to. So, if either the inner function or the outer function assigns to the variable, they're not assigning to one of their fast local slots, but assigning to the cell_contents of a cell in one of their fast local slots, so they can each see what the other one does.

    That's why local variables that are referenced by a nested scope work like nonlocal variables. They're stored in co_cellvars instead of co_freevars. Your frame's actual space is co_nlocals + len(co_cellvars) + len(co_freevars). The name that goes with a LOAD_DEREF 2 is equivalent to (co_cellvars+co_freevars)[2]. At runtime, that LOAD_DEREF 2 actually does a f_localsplus[co_nlocals+2].cell_contents. When you construct a nested function at runtime, there's zero or more LOAD_CLOSURE bytecodes to push the current frame's cells onto the stack, then a MAKE_CLOSURE instead of MAKE_FUNCTION, which does the extra step of popping those cells off and stashing them in the function's __closure__ attribute. And when you call a function, its __closure__ cells get copied into the f_localsplus array. Meanwhile, even if your function doesn't access any variables from an enclosing scope, if a function nested within your function does so, they're free variables for you, and have to get put in your __closure__ so they can end up in your stack frame so they can get to your nested function's __closure__ and stack frame.

    And I believe that's all of the actual details correct.

    But you don't have to memorize all this stuff, or go dig it out of the C code, or work it out by trial and error, or plow your way through that horrible paragraph ever again; the docs for the dis module describe what each opcode does, and have improved tremendously over the past few versions of Python. If you can't remember exactly what LOAD_DEREF looks up and where, just look at the docs: "Loads the cell contained in slot i of the cell and free variable storage. Pushes a reference to the object the cell contains on the stack." That might not have explained everything to you if you'd never read anything before, but it should be more than enough to refresh your memory once you have.

    Also, again, most of this is exposed to Python, so the Python-esque pseudocode above is almost all real code. However, from Python, you can't write to a cell through its cell_contents, and you can't construct new cell instances. (If you want to have fun with ctypes.pythonapi, you can probably figure it out from here.)

    Finally, locals(), in addition to giving you a snapshot of your locals rather than a live way to change them, also flattens out any cells, giving you their cell_contents instead.

    At first glance, it seems like it would be simpler to just have each function's stack frame have a pointer to its outer function's stack frame. You could still optimize things just as much by having instructions like LOAD_OUTER 3, 2 to go 3 steps out and get the fast-local 2 from that frame, right? In fact, some languages do implement closures that way. But that means that the a function's stack frame can't go away--and, therefore, neither can anything else it refers to--until all closures referring to any of its variables, or any of its outer scopes, goes away. And that "anything else it refers to" includes the stack frame of its caller. You can see why this could be a problem. There are ways to optimize that (although you still run into problems with coroutines), but it's actually simpler, not more complicated, to just reference the individual values pulled off the frame, as Python does.

    Globals and builtins


    Globals are easy. If the compiler sees a name that isn't assigned in the current function or an outer nested function, or that has a global statement, it's global. There's no special operation for builtins; it's just a fallback for every global lookup.

    The compiler does still use an array here, but only to fetch the name from a constant tuple on the code object (co_names) to look up with LOAD_GLOBAL and STORE_GLOBAL at runtime. They look something like this:

    def LOAD_GLOBAL(index):
        name = frame.f_code.co_names[index]
        try:
            return frame.f_globals[name]
        except KeyError as e:
            try:
                builtins = frame.f_globals['__builtins__']
                if isinstance(builtins, ModuleType): builtins = builtins.__dict__
                return builtins[name]
            except KeyError as f:
                raise e
    
    def STORE_GLOBAL(index, value):
        name = frame.f_code.co_names[index]
        frame.f_globals[name] = value
    

    That f_globals will normally be the actual globals for the module the current (module-level, function, or class) code was defined in, but it could be something else. (For example, exec lets you pass in any mapping for globals; if you want a sanitized and quasi-sandboxed environment with your own under-the-hood functions and builtins like exec and __import__ stripped out, that's easy.)

    Much simpler than closure variables, right? But it can be slower. So, why have a special case for globals instead of just making it the last stop on the closure chain (or making builtins the last stop instead of a special case within the globals special case)?

    Well, for one thing, not having fast locals or cells and doing everything through the globals dict allows for more dynamic stuff, like creating new variables in exec and using them in the same scope. For another, not having to make a list of all of the names at scope, simply adding them to the dict as they're created, means that the global scope can be executed incrementally, statement by statement (so it works the same as in the REPL--in fact, both the REPL and the top-level script work by just incrementally building a module named __main__). In particular, you can define a function that uses a global, and start passing it around, before that global has been created.

    More importantly, lookup speed shouldn't really matter at global level. If you're doing any heavy computation that involves millions of name lookups at module scope, move it into a function and call that function. And if you're using those globals millions of times from another function, get them out of global scope too. Those are "structured programming 101" anyway; why try to optimize things that people shouldn't be doing?

    Well, except for the fact that your module's top-level functions and classes are globals, and it's perfectly reasonable to use them millions of times. In less dynamic languages, they're only looked up as globals at compile time, so they aren't like runtime global variables--but in Python, functions, types, and everything else are first-class values, and they're looked up the same way as any other global variables. In practice, this usually isn't a bottleneck for most programs; when Python is too dynamic and slow for the inner code of some project, there are usually other factors that matter much more than this dict lookup. But occasionally, it is. So, this is one of those places people keep looking for new optimizations. At the user level, you can always just copy the global to a local or a default parameter value, as mentioned earlier. Or you can use an ambitious optimizer that tries to automate the same benefit (like FAT Python, which creates an extra version of each function's __code__ with globals copied into the co_consts, and wrapping the function in a guard that checks that the cached globals haven't changed since the function was compiled and, if so, calling the original "slow" function instead of the fast version).

    What about classes?


    A class definition is a scope, just like a function definition. There are a couple of minor differences (the special LOAD_CLASSDEREF opcode, the magic __class__ cell that's created in class scope so methods can access it directly or via super, etc.). But the really big difference is that the class code isn't executed every time the class is called; it's executed once to produce the dict that gets passed to type (or another metaclass), and then it, and its scope, goes away. (Of course the body of its __new__ and/or __init__ will get executed when the class is called.)

    I don't think there's anything important to learn about classes nested in functions or classes (or classes nested in functions, unless you want to know how super works) that isn't obvious once you play with it.

    Attributes


    Attribute lookup seems really simple at first: Every attribute lookup is compiled to LOAD_ATTR, which uses a string taken from the same co_names array as LOAD_GLOBAL, and looks it up in the object's __dict__. Or, if it's not found there, in the object's type's dict. Or, if not there, then each type on that type's MRO. If that all fails, __getattr__ is looked up (but using special-method lookup, not normal attribute lookup) and called. Oh, unless __getattribute__ was overridden, in which case whatever it does happens instead. Also, somehow functions get magically turned into bound methods. And @property, and __slots__ and... and how does it find the __dict__ in the first place? And what about C-API types? Maybe it's not so simple after all...

    It's actually pretty simple: the only rule in the interpreter is special-method lookup, and LOAD_ATTR only ever uses that rule to look up and then call __getattribute__. (Other special methods are handled internally by their opcodes or other special builtin or extension functions written in C.) Everything else happens inside __getattribute__; we'll get to that later.

    Special method lookup


    Special method lookup is only somewhat vaguely documented, in the data model reference chapter. See call_method and its helpers in the source for details.

    The basic idea of special method lookup is that it ignores the instance dictionary, and most kinds of dynamic customization.

    You may wonder what the point of special method lookup is. Obviously __getattribute__ lookup can't be hooked by __getattribute__, but why should it also skip the instance, or treat metaclasses differently? And why should other special methods like __add__ follow the same rules? (If you wanted to add per-instance methods to a value, __add__ or __lt__ would probably be the first things you'd think of.)

    In fact, the original "new-style class" design for Python 2.3 didn't have special method lookup--but (long before the first beta), Guido discovered a problem: Some methods make sense both on normal objects, and on types. For example, hash(int) makes just as much sense as hash(3). By normal lookup rules, both call int.__hash__. By special method lookup rules the former skips int's dict and instead calls type.__hash__. And a few other methods have the same problem, like str and repr. And what would happen if a type had both __call__ and __new__? And, even those that don't make sense for types--int + str, abs(Counter), or -socket--might make sense for some subclass of type (e.g., the new optional static types use subscripting, like Iterable[int], for generics).

    So, that's why we have special method lookup, and it obviously solves the __getattribute__ problem.

    How do we find the __dict__ without finding the __dict__ first?


    There's still a circular definition problem here. How do we find an attribute? By calling the __getattribute__ attribute on the object's type and walking its MRO looking in the __dict__ of each type. So... how do we get that __dict__? Certainly not by searching __dict__s. And the process also involves a variety of other attribute lookups--you get an object's type via its __class__, a type's MRO via its __mro__, and so on.

    The secret is that none of these are actual attribute lookups. In the C struct representing each object, there's an ob_type pointing to the type. In the C struct representing each type object, there's a tp_mro pointing to the MRO, and a tp_getattro pointing to the __getattribute__, and a tp_dict pointing to the __dict__, and so on.

    But you can set __class__ or __dict__ or __getattribute__ from Python. How does that work? In general, when you overwrite one of these special values, the interpreter updates the corresponding C struct slot. (There are a few cases where it does different things, like using a slot as a cache and searching the dict if it's NULL, but that works too.)

    So, in the pseudocode below, when I write type(obj).__mro__, that's really following the ob_type slot from the object, then the tp_mro slot from the type, and so on.

    Finding __getattribute__


    The basic process to lookup and call __getattribute__ is, in pseudocode:

    def find_special(obj, name):
        for t in type(obj).__mro__:
            if name in t.__dict__:
                it = t.__dict__[name]
                __get__ = find_special(it, '__get__')
                if __get__:
                    it = __get__(obj, type(obj))
                return it
    
    def LOAD_ATTR(obj, index):
        name = f_code.co_names[index]
        __getattribute__ = find_special(obj, '__getattribute__')
        if not __getattribute__:
            raise AttributeError
        try:
            return __getattribute__(name)
        except AttributeError:
            __getattr__ = find_special(obj, '__getattr__')
            if not __getattr__:
                raise
            return __getattr__(name)
    

    If you don't understand the __get__, read the Descriptor HowTo Guide.

    (It shouldn't be possible to ever fail to find a __getattribute__, but there's code for it anyway--if you play around with the C API, you can try to call a method on a class before setting up its MRO, and you get an AttributeError rather than a segfault or something. On the other hand, __getattr__ can easily be missing, because object doesn't implement that, nor do most other builtin/stdlib types.)

    I've skipped over a few things (like the code to handle types that aren't fully constructed yet without crashing, etc.), but that's the basic idea. And it's also how other special-method lookups, like the nb_add (__add__) lookup inside BINARY_ADD, work.

    Notice that a __getattribute__ defined in the instance, or dynamically (by __getattribute__?), or a __getattribute__, __mro__, __class__, or __dict__ pushed onto the type by its metatype, all get ignored here.

    But if you've overridden __getattribute__ on your object's type, or any of its supertypes, or done the equivalent in C, your code will get called (as a bound method--that's the point of the descriptor get bit).

    One thing I skipped that may be worth noting is the method cache. See the macros at the top of typeobject.c, but the basic idea is that 4096 methods (that aren't special-cased away or given unreasonable names) get cached, so they can skip all the MRO-walking and dict lookups and jump right to the descriptor call.

    Inside the default __getattribute__


    Assuming you got to object.__getattribute__, what it does is pretty similar to special-method lookup all over again, but with some minor differences. The short version is that it allows instances, metaclass __getattribute__, and __getattr__ to get involved. In full detail, it's something like:

    • There's no type slot equivalent to "general name known only by a string passed in at runtime", so the per-type step does the tp_dict[name] bit.
    • Four-stage lookup instead of just the per-type MRO walk:
      1. Look on the type and its MRO for a data descriptor (that is, a descriptor with __set__). If so, return its __get__(obj, type(obj)).
      2. Look in the object itself only: if tp_dict (__dict__) and tp_dict[name] exist, you're done--do not call its __get__ method, just return it.
      3. Look on the type and its MRO for a non-data descriptor or non-descriptor. Return its __get__(obj, type(obj)) (or, if it's not a descriptor, just return it).
      4. Special-method lookup __getattr__, call that, and do the __get__(obj, type(obj)) on the result if present.

    So, in pseudocode:

    def __getattribute__(self, name):
        for t in type(obj).__mro__:
            try:
                desc = t.__dict__[name]
                if '__get__' in desc.__dict__ and '__set__' in desc.__dict__:
                    return desc.__dict__['__get__'](obj, type(obj))
                break
            except AttributeError:
                pass
        if name in self.__dict__:
            return self.__dict__[name]
        try:
            if '__get__' in desc.__dict__:
                return desc.__dict__['__get__'](obj, type(obj))
            return desc
        except UnboundLocalError:
            __getattr__ = special_method(self, '__getattr__')
            if __getattr__: return __getattr__(name)
            raise AttributeError
    

    I believe the point of that multi-stage lookup is to make it easy to shadow methods (which are non-data descriptors) but hard to accidentally shadow properties and similar things (which are data descriptors).

    Notice that if the descriptor __get__ raises AttributeError, that will trigger the fallback to __getattr__ from the previous process.

    What about classes?


    If you look something up on a class, the class is the object, and the metaclass is the type. And if your metaclass doesn't override __getattribute__, type does (and it never raises or supers).

    The only difference between type.__getattribute__ and object.__getattribute__ is at the descriptor steps, it calls __get__(None, cls). So, everything works the same, except for the descriptor get bit. (If you work things through, you should see how this, e.g., lets @classmethods work the same whether called on an instance or on the class, and how it lets metaclass methods act like classmethods.)

    Other tricks


    What else is left?

    • __slots__ works by creating a hidden array, and a set of descriptors that read and write to that array, and preventing a tp_dict from being created on the instance.
    • Methods don't do anything special--spam.eggs() is the same as tmp = spam.eggs; tmp() (unlike many other OO languages). They work because they're stored as plain-old functions in the class dict, and plain-old functions are descriptors that build bound methods. See How methods work if that isn't clear.
    • Everything else--@property, @staticmethod, etc.--is just a descriptor; object.__getattribute__ (or type.__getattribute__) finds it on the class, calls its __get__ method, and gives you the result. (It may help to work through how @classmethod produces a bound class method whether looked up on the type Spam.eggs or on the instance spam.eggs.)
    • There are various C API mechanisms to expose struct members, get/set function pairs, etc. as if they were Python attributes. But it's pretty obvious how these work unless you want the gory details, in which case they're in the C API docs.

    What about super?


    It uses four different tricks to be that super, thanks for asking:
    • In every class definition, __class__ is a local referring to the class currently being defined.
    • The compiler rewrites super() to super(__class__, co_varnames[0]). (The first local is the first parameter, so this handles self in normal methods, cls in classmethods and __new__ and normal metaclass methods, mcls in metaclass __new__ and metaclass classmethods, even if you decide to call them something confusing like s or this.)
    • Each method defined in a class gets __class__ as a free variable whenever it refers to super, not just whenever it directly refers to __class__.
    • super.__getattribute__ overrides the normal behavior from object to jump ahead on the MRO chain after the current __class__.
    7

    View comments

  2. In Python (I'm mostly talking about CPython here, but other implementations do similar things), when you write the following:
        def spam(x):
            return x+1
        spam(3)
    
    What happens?

    Really, it's not that complicated, but there's no documentation anywhere that puts it all together. Anyone who's tried hacking on the eval loop, understands it, but explaining it someone else is very difficult. In fact, the original version of this was some notes to myself, which I tried to show to someone who I'm pretty sure is at least as smart as me, and he ended up completely lost. So I reorganized it so that, at least hopefully, you can start each section and then skip the rest of the section when you get over your head (and maybe skip the last few sections entirely) and still learn something useful. If I've failed, please let me know.

    Compilation

    To make things simpler to explore, let's put all of that inside an outer function for the moment. This does change a few things (notably, spam becomes a local within that outer function rather than a global), but it means the code that creates and calls the function sticks around for us to look at. (Top-level module code is executed and discarded at module import time; anything you type at the REPL is compiled, executed, and discarded as soon as the statement is finished; scripts work similar to the REPL.)

    So, there are two top-level statements here.

    The def statement creates a function object out of a name and a body and then stores the result in a local variable corresponding to that name. In pseudocode:
        store_local("spam", make_function("spam", spam_body))
    The call expression statement takes a function object and a list of arguments (and an empty dict of keyword arguments) and calls the function (and then throws away the result). In pseudocode:
        call_function(load_local("spam"), [3], {})
    So, the only tricky bit here is, what is that spam_body? Well, the compiler has to work recursively: it sees a def statement, and it knows that's going to compile into a make_function that takes some kind of body-code object, so it compiles the body suite of the def into that object, then stashes the result as a constant. Just like the number 3 and the string "spam" are constants in the outer function, so is the code object spam_body.

    So, what does the body do? It's pretty simple:
        return(add(load_local("x"), 1))
    Notice that parameters like x are just local variables, the same as any you define inside the function; the only difference is that they get an initial value from the caller's arguments, as we'll see later.

    Obviously real Python bytecode is a bit different from this pseudocode, but we'll get back to that at the very end. There are a few easier questions to get through first, starting with: what kind of thing is a code object?

    Code objects

    You need more than just compiled bytecode to store and execute a function body, because the bytecode references things like constants, which have to go somewhere, and because that call_function is going to need some information about its parameters to know how to map the first argument 3 to the local variable x.

    The object with all this information is a code object. The inspect module docs give some detail of what's in a code object. But some of the key members are:
    • co_consts, a tuple of constant values. In spam this just has the number 1. In the top-level function, it has the number 3, the string "spam", and the code object for the spam body.
    • co_argcount and related values that specify the calling convention. For spam, there's an argcount of 1, meaning it's expecting 1 positional parameter, and also meaning its first 1 local variables in co_varnames are parameters. (This is why you can call it with spam(x=3) instead of spam(3).) The full details of how arguments get mapped to parameters is pretty complicated (and I think I've written a whole blog post about it).
    • co_code, which holds the actual compiled bytecode, which we'll get to later.
    There's also a bunch of stuff there that's only needed for tracebacks, reflection, and debugging, like the filename and line number the source code came from.

    So, the compiler, after recursively compiling the inner function body into bytecode, then builds the code object around it. You can even do this yourself in Python, although it's a bit of a pain—type help(types.CodeType) and you'll see the 15-parameter constructor. (Some of those parameters are related to closures, which I'll get to later.)

    Function objects

    The make_function pseudocode above just takes a code object and a name and builds a function object out of it.

    Why do we need a separate function object? Why can't we just execute code objects? Well, we can (via exec). But function objects add a few things.

    First, function objects store an environment along with the code, which is how you get closures. If you have 16 checkboxes, and an on_check function that captures the checkbox as a nonlocal variable, you have 16 function objects, but there's no need for 16 separate code objects.

    In the next few sections, we'll ignore closures entirely, and come back to them later, because they make things more complicated (but more interesting).

    Function objects also store default values, which get used to fill in parameters with missing arguments at call time. The fact that these are created at def time is useful in multiple ways (although it also leads to the common unexpected-mutable-default bug).

    If you look at the inspect module, you'll see that the key attributes of a function are just the code object in __code__, the default values in __defaults__ and __kwdefaults__, and the closure environment in __closure__ (for nonlocals, which I'll explain later) and __globals__ (for globals—these aren't captured individually; instead, an entire globals environment is). And, of course, a bunch of stuff to aid tracebacks, debugging, reflection, and static type-checking.

    You can do the same thing as that make_function pseudocode instruction from Python—try help(types.FunctionType) to see the parameters to the constructor. And now you know enough to do some simple hackery, like turning spam from a function that adds 1 to a function that adds 2:
        c = spam.__code__
        consts = tuple(2 if const==1 else const 
                       for const in c.co_consts)
        nc = types.CodeType(
            c.co_argcount, c.co_nlocals, c.co_stacksize, 
            c.co_flags, c.co_code, consts, c.co_names, 
            c.co_varnames, c.co_filename, c.co_name, 
            c.co_firstlineno, c.co_lnotab, c.co_freevars, 
            c.co_cellvars)
        spam = types.FunctionType(
            nc, spam.__name__, spam.__defaults__, spam.__closure__)
    
    There are a few limits, but most things you can imagine are doable, and work the way you'd expect. Of course if you want to get fancy, you should consider using a library like byteplay instead of doing it manually.

    Fast Locals

    In reality, Python doesn't do load_local("x"), looking up x in a dict and returning the value, except in special cases. (It does do that for globals, however.)

    At compile time, the compiler makes a list of all the locals (just x here) and stores their names in the code object's co_varnames, and then turns every load_local("x") into local_fast(0). Conceptually, this means to do a load_local(co_varnames[0]), but of course that would be slower, not faster. So what actually happens is that the local variables are stored in an array that gets created at the start of the function call, and load_fast(0) just reads from slot #0 of that array.

    Looking things up by name (a short string whose hash value is almost always cached) in a hash table is pretty fast, but looking things up by index in an array is even faster. This why the def foo(*, min=min) microoptimization works—turning load_global("min") into load_local("min") might help a little (because the local environment is usually smaller, not to mention that builtins require an extra step over normal globals), but turning it into load_fast(0) helps a lot more.

    But it does mean that if you call locals(), Python has to build a dict on the fly—and that dict won't be updated with any further changes, nor will any changes you make to the dict have any effect on the actual locals. (Usually. If, say, you're executing top-level code, the locals and globals are the same environment, so changing locals does work.)

    That's also why you usually can't usefully exec('x = 2')—that again just creates a dict on the fly with a copy of the locals, then executes the assignment against that dict. (In Python 2, exec was a statement, and it would create a dict on the fly, execute the code, and then try to copy changed values back into the array. Sometimes that worked, but there were tons of edge cases. For example, if you never had a non-exec assignment to x, the compiler couldn't know to put x in co_varnames in the first place.)

    Finally, that means closures can't be implemented as just as stack of dicts (as in many Lisp implementations), but Python has a different optimization for them anyway, as we'll see later.

    Calling

    OK, so now we know everything about how the def statement works (except for the actual bytecode, which I'll get to later, and closure-related stuff, which I'll also get to later). What about the call expression?

    To call a function, all you need to supply is the function object and a list of positional arguments (and a dict of keyword arguments, and, depending on the Python version, *args and/or **kw may be part of the call rather than splatting them into the list/dict in-place… but let's keep it simple for now). What happens inside the interpreter when you do this?

    Well, first, calling a function actually just looks up and calls the function object's __call__ method. That's how you can call class objects, or write classes whose instances are callable. Near the end, we'll get into that. But here, we're talking about calling an actual function object, and the code for that works as follows.

    The first thing it needs to do is create a new stack frame to run the function's code in. As with codes and functions, you can see the members of a frame object in the inspect docs, and you can explore them in the interactive interpreter, and you can find the type as types.FrameType (although, unlike code and function objects, you can't construct one of these from Python). But the basic idea is pretty simple:

    A frame is a code object, an environment, an instruction pointer, and a back-pointer to the calling frame. The environment is a globals dict and a fast locals array, as described earlier. That's it.

    You may wonder how the compiler builds that list of co_varnames in the first place. While it's parsing your code, it can see all the assignments you make. The list of local variables is just the list of parameters, plus the list of names you assign to somewhere in the function body. Anything you access that isn't assigned to anywhere is (ignoring the case of closures, which we'll get to later) is a global; it goes into co_names, and gets looked up as a global (or builtin) at runtime.

    To construct a frame, in pseudocode:
        code = func.__code__
        locals = [_unbound for _ in range(code.co_nlocals)]
        do_fancy_arg_to_param_mapping(locals, args, kwargs)
        frame = Frame()
        frame.f_back = current_frame
        frame.f_lasti = 0
        frame.f_code = code
        frame.f_locals = locals
    
    And then all the interpreter has to do is recursively call interpreter(frame), which runs along until the interpreter hits a return, at which point it just returns to the recursive caller.

    The interpreter doesn't actually need to be recursive; a function call could just be the same as any other instruction in the loop except that it sets current_frame = frame, and then returning would also be the same as any other instruction except that it sets current_frame = current_frame.f_back. That lets you do deep recursion in Python without piling up on the C stack. It makes it easier to just pass frame objects around and use them to represent coroutines, which is basically what Stackless Python is all about. But mainline Python can, and does, already handle the latter just by wrapping frames up in a very simple generator object. Again, see inspect to see what's in a generator, but it's basically just a frame and a couple extra pieces of information needed to handle yield from and exhausted generators.

    Notice the special _unbound value I used in the pseudocode. In fact, at the C level, this is just a null pointer, although it could just as easily be a real sentinel object. (It could even be exposed to the language, like JavaScript's undefined, although in JS, and most other such languages, that seems to cause a lot more problems than it solves.) If you try to access a local variable before you've assigned to it, the interpreter sees that it's unbound and raises an UnboundLocalError. (And if you del a local, it gets reset to _unbound, with the same effect.)

    Defaults

    I glossed over the issues with default values above. Let's look at them now.

    Let's say we defined def spam(x=0):. What would that change?

    First, the body of spam doesn't change at all, so neither does its code object. It still just has an x parameter, which it expects to be filled in by the function-calling machinery in the interpreter, and it doesn't care how. You can dis it and explore its members and nothing has changed.
    Its function object does change, however—__defaults__ now has a value in it.

    If you look at the outer function, its code changes. It has to store the 0 as a constant, and then load that constant to pass to make_function. So the first line of pseudocode goes from this:
        store_local("spam", make_function("spam", spam_body))
    … to this:
        store_local("spam", 
            make_function("spam", spam_body, defaults=(0,)))
    Inside the interpreter's make_function implementation, it just stores that tuple in the created function's __defaults__ attribute.

    At call time, the function-calling machinery is a bit more complicated. In our case, we passed a value for x, so nothing much changes, but what if we didn't? Inside call_function, if the function expects 1 positional parameter, and we passed 0 positional arguments, and we didn't provide a keyword argument matching the name of the function's code's first positional parameter, then it uses the first value from the function's __defaults__ instead, and puts that in the frame's locals array the same way it would for a value we passed in explicitly.

    (If we hadn't set a default value, and then called without any arguments, it would try to use the first value from __defaults__, find there isn't one, and raise a TypeError to complain.)

    This explains why mutable default values work the way they do. At def time, the value gets constructed and stored in the function's __defaults__. Every time the function is called without that argument, nobody constructs a new value, it just copies in the same one from the same tuple every time.

    Closures

    As soon as you have nonlocal variables, things get more fun. So let's start with the most trivial example:
        def eggs():
            y = 1
            def spam(x):
                return x + y
            spam(3)
    
    Our inner function has a free variable, y, which it captures from the outer scope. Its code doesn't change much:
    return(add(load_local("x"), 
            load_local_cell("y").cell_contents))
    But the outer function changes a bit more. In pseudocode:
        load_local_cell("y").set_cell_contents(1)
        store_local("spam", make_function("spam", spam_body, 
            closure=[load_local_cell("y")]))
        call_function(load_local("spam"), [], {})
    The first thing to notice is that y is no longer a normal local variable. In both functions, we're doing a different call, load_local_cell to look it up. And what we get is a cell object, which we have to look inside to get or set the actual value.

    Also notice that when we're calling make_function, we pass the cell itself, not its contents. This is how the inner function ends up with the same cell as the outer one. Which means if either function changes the cell's contents, the other one sees it.

    The only hard part is how the compiler knows that y is a cellvar (a local variable you share with a nested inner function) in eggs and a freevar in spam (a local variable that an outer function is sharing with you, or with a deeper nested function).

    Remember that the compiler scans the code for assignments to figure out what's local. If it finds something that isn't local, then it walks the scope outward to see if it's local to any containing function. If so, that variable becomes a cellvar in the outer function, and a freevar in the inner function (and any functions in between). If not, it becomes a global. (Unless there's an explicit nonlocal or global declaration, of course.) Then the compiler knows which code to emit (local, cell, or global operations) for each variable. Meanwhile, it stores the list of cellvars and freevars for each code object in co_cellvars and co_freevars. When compiling a def statement, the compiler also needs to look at the co_freevars and inserts that closure = [load_load_cell("y")] and passes it along to the make_function.

    Inside call_function, if the code object has any cellvars, the interpreter creates an empty (_unbound) cell for each one. If it has any freevars, the interpreter copies the cells out of the function object's __closure__.

    And that's basically all there is to closures, except for the fast local optimization.

    For historical reasons, the way things get numbered is a little odd. The f_locals array holds the normal locals first, then the cellvars, then the freevars. But cellvars and freevars are numbered starting from the first cellvar, not from the start of the array. So if you have 3 normal locals, 2 cellvars, and 2 freevars, freevar #2 matches slot 0 in co_freevars, and slot 5 in f_locals. Confused? It's probably easier to understand in pseudocode than English. But first…

    For an additional optimization, instead of just one load_fast_cell function, Python has a load_closure that just loads the cell, load_deref that loads the cell's cell_contents in a single step (without having to load the cell object itself onto the stack), and store_deref that stores into the cell's cell_contents in a single step.

    So, the pseudocode to construct a frame looks like this:
        code = func.__code__
        locals = [_unbound for _ in range(code.co_nlocals)]
        cells = [cell() for _ in range(len(code.co_cellvars))]
        do_fancy_arg_to_param_mapping(locals, args, kwargs)
        frame = Frame()
        frame.f_back = current_frame
        frame.f_lasti = 0
        frame.f_code = code
        frame.f_locals = locals + cells + list(func.__closure__)
    
    And the pseudocode for load_closure, load_deref, and store_deref, respectively:
        frame.f_locals[frame.f_code.co_nlocals + i]
        frame.f_locals[frame.f_code.co_nlocals + i].cell_contents
        frame.f_locals[frame.f_code.co_nlocals + i].set_cell_contents(value)
    
    These cells are real Python objects. You can look at a function's __closure__ list, pick out one of the cells, and see its cell_contents. (You can't modify it, however.)

    Example

    It may be worth working through an example that actually relies on closure cells changing:
    def counter():
        i = 0
        def count():
            nonlocal i
            i += 1
            return i
        return count
    count = counter()
    count()
    count()
    
    If you want to go through all the details, you can easily dis both functions (see the section on bytecode below), look at their attributes and their code objects' attributes, poke at the cell object, etc. But the short version is pretty simple.

    The inner count function has a freevar named i. This time, it's explicitly marked nonlocal (which is necessary if you want to rebind it). So, it's going to have i in its co_freevars, and some parent up the chain has to have a cellvar i or the compiler will reject the code; in our case, of course, counter has a local variable i that it can convert into a cellvar.

    So, count is just going to load_deref the freevar, increment it, store_deref it, load_deref it again, and return it.

    At the top level, when we call counter, the interpreter sets up a frame with no locals and one cellvar, so f_locals has one empty cell in it. The i = 0 does a store_deref to set the cell's value. The def does a load_closure to load the cell object, then passes it to make_function to make sure it ends up in the defined function's __closure__, and then it just returns that function.

    When we call the returned function, the interpreter sets up a frame with no locals and one freevar, and copies the first cell out of __closure__ into the freevar slot. So, when it runs, it updates the 0 in the cell to 1, and returns 1. When we call it again, same thing, so now it updates the 1 in the cell to 2, and returns 2.

    Other callable types

    As mentioned earlier, calling a function is really just looking up and calling the function's __call__ method. Of course if calling anything means looking up its __call__ method and calling that, things have to bottom out somewhere, or that would just be an infinite recursion. So how does anything work?

    First, if you call a special method on a builtin object, that also gets short-circuited into a C call. There's a list of slots that a builtin type can provide, corresponding to the special dunder methods in Python. If a type has a function pointer in its nb_add slot, and you call __add__ on an instance of that type, the interpreter doesn't have to look up __add__ in the dict, find a wrapper around a builtin function, and call it through that wrapper; it can just find the function pointer in the slot for the object's type and call it.

    One of those slots is tp_call, which is used for the __call__ method.

    Of course the function type defines tp_call with a C function that does the whole "match the args to params, set up the frame, and call the interpreter recursively on the frame" thing described earlier. (There's a bit of extra indirection so this can share code with eval and friends, and some optimized special cases, and so on, but this is the basic idea.)

    What if you write a class with a __call__ method and then call an instance of it? Well, spam.__call__ will be a bound method object, which is a simple builtin type that wraps up a self and a function. So, when you try to call that bound method, the interpreter looks for its __call__ by calling its tp_call slot, which just calls the underlying function with the self argument crammed in. Since that underlying function will be a normal Python function object (the one you defined with def __call__), its tp_call does the whole match-frame-eval thing, and your __call__ method's code gets run and does whatever it does.

    Finally, most builtin functions (like min) don't create a whole type with a tp_call slot, they're just instances of a shared builtin-function type that just holds a C function pointer along with a name, docstring, etc. so its tp_call just calls that C function. And similarly for methods of builtin types (the ones that aren't already taken care of by slots, and have to get looked up by dict). These builtin function and method implementations get a self (or module), list of args, and dict of kwargs, and have to use C API functions like PyArg_ParseTupleAndKeywords to do the equivalent of argument-to-parameter matching. (Some functions use argclinic to automate most of the annoying bits, and the goal is for most of the core and stdlib to do so.) Beyond that, the C code just does whatever it wants, returning any Python object at the end, and the interpreter then puts that return value on the stack, just as if it came from a normal Python function.

    You can see much of this from Python. For example, if f is a function, f.__call__ is a <method-wrapper '__call__' of function object>, f.__call__.__call__ is a <method-wrapper '__call__' of method-wrapper object>, and if you pile on more .__call__ you just get method-wrappers around method-wrappers with the same method. Similarly, min is a builtin function, min.__call__ is a <method-wrapper '__call__' of builtin_function_or_method object>, and beyond that it's method-wrappers around method-wrappers. But if you just call f or min, it doesn't generate all these wrappers; it just calls the C function that the first method-wrappers is wrapping.

    Actual bytecode

    Real Python bytecode is the machine language for a stack machine with no registers (except for an instruction pointer and frame pointer). Does that sound scary? Well, that's the reason I've avoided it so far—I think there are plenty of people who could understand all the details of how function calls work, including closures, but would be scared off by bytecode.

    But the reality is, it's a very high-level stack machine, and if you've made it this far, you can probably get through the rest. In fact, I'm going to go out of my way to scare you more at the start, and you'll get through that just fine.

    Let's go back to our original trivial example:
        def spam(x):
            return x+1
        spam(3)
    
    Don't put this inside a function, just paste it at the top level of your REPL. That'll get us a global function named spam, and we can look at what's in spam.__code__.co_code:
        b'|\x00d\x01\x17\x00S\x00'
    Well, that's a bit ugly. We can make it a little nicer by mapping from bytes to hex:
        7c 00 00 64 01 00 17 00 00 53 00 00
    But what does that mess mean?

    Bytecode is just a sequence of instructions. Each instruction has a 1-byte number; if it's up to 0x90, it's followed by a 2-byte (little-endian) operand value. So, we can look up 0x7c in a table and see that it's LOAD_FAST, and the 2-byte operand is just 0, just like our pseudocode load_fast(0). So, this is taking the frame's f_locals[0] (which we know is x, because co.varnames[0] is 'x'), and pushing it on the stack.

    Fortunately, we don't have tc do all this work; the dis module does it for us. Just call dis.dis(outer.__code__.co_consts[0]) and you get this:
         0 LOAD_FAST 0 (x)
         3 LOAD_CONST 1 (1)
         6 BINARY_ADD
         7 RETURN_VALUE
    
    The dis docs also explain what each op actually does, so we can figure out how this function works: It pushes local #0 (the value of x) onto the stack, then pushes constant #1 (the number 1) onto the stack. Then it pops the top two values, adds them (which in general is pretty complicated—it needs to do the whole deal with looking up __add__ and __radd__ methods on both objects and deciding which one to try, as explained in the data model docs), and puts the result on the stack. Then it returns the top stack value.

    Brief aside: If operands are only 16 bits, what if you needed to look up the 65536th constant? Or, slightly more plausibly, you needed to jump to instruction #65536? That won't fit in a 16-bit operand. So there's a special opcode EXTENDED_ARG (with number 0x90) that sets its 16-bit operand as an extra (high) 16 bits for the next opcode. So to load the 65536th constant, you'd do EXTENDED_ARG 1 followed by LOAD_FAST 0, and this means LOAD_FAST 65536.

    Anyway, now let's add the outer function back in, and compare the pseudocode to the bytecode:
        def outer():
            def spam(x):
                return x+1
            spam(3)
    
    In pseudocode:
        store_fast(0, make_function("spam", spam_body))
        call_function(load_fast(0), [3], {})
    In real bytecode:
         0 LOAD_CONST 1 (<code object spam at 0x12345678>)
         3 LOAD_CONST 2 ('spam')
         6 MAKE_FUNCTION 0
         9 STORE_FAST 0 (spam)
        12 LOAD_FAST 0 (spam)
        15 LOAD_CONST 3 (3)
        18 CALL_FUNCTION 1
        21 POP_TOP
        22 LOAD_CONST 0 (None)
        25 RETURN_VALUE
    
    So, 0-9 map to the first line of pseudocode: push constant #1, the code object (what we called spam_body), and constant #2, the name, onto the stack. Make a function out of them, and store the result in local #0, the spam variable.

    MAKE_FUNCTION can get pretty complicated, and it tends to change pretty often from one Python version to another, read the dis docs for your version. Fortunately, when you have no default values, annotations, closures, etc., so the pseudocode is just make_function(name, code), you just push the name and code and do MAKE_FUNCTION 0.

    Line 12-18 map to the second line of pseudocode: push local #0 (spam) and constant #3 (the number 3) onto the stack, and call a function.

    Again, CALL_FUNCTION can get pretty complicated, and change from version to version, but in our case it's dead simple: we're passing nothing but one positional argument, so we just put the function and the argument on the stack and do CALL_FUNCTION 1.

    The interpreter's function-call machinery then has to create the frame out of the function object and its code object, pop the appropriate values off the stack, figure out which parameter to match our argument to and copy the value into the appropriate locals array slot, and recursively interpret the frame. We saw above how that function runs. When it returns, that leaves a return value onto the stack.

    Line 21 just throws away the top value on the stack, since we don't want to do anything with the return value from spam.

    Lines 22-25 don't map to anything in our pseudocode, or in our source code. Remember that in Python, if a function that falls off the end without returning anything, it actually returns None. Maybe this could be handled magically by the function-call machinery, but it's not; instead, the compiler stores None in the code object's constants, then adds explicit bytecode to push that constant on the stack and return it.

    By the way, you may have noticed that the bytecode does some silly things like storing a value into slot 0 just to load the same value from slot 0 and then never use it again. (You may have noticed that some of my pseudocode does the same thing.) Of course it would be simpler and faster to just not do that, but Python's limited peephole optimizer can't be sure that we're never going to load from slot 0 anywhere else. It could still dup the value before storing so it doesn't need to reload, but nobody's bothered to implement that. There have been more detailed bytecode optimizer projects, but none of them have gotten very far—probably because if you're serious about optimizing Python, you probably want to do something much more drastic—see PyPy, Unladen Swallow, ShedSkin, etc., which all make it so we rarely or never have to interpret this high-level bytecode instruction by instruction in the first place.

    MAKE_FUNCTION and CALL_FUNCTION

    As mentioned above, these two ops can get pretty complicated. As also mentioned, they're two of the most unstable ops, changing from version to version because of new features or optimizations. (Optimizing function calls is pretty important to overall Python performance.) So, if you want to know how they work, you definitely need to read the dis docs for your particular Python version. But if you want an example (for pre-3.5), here goes.

    Take this pseudocode:
        make_function("spam", spam_body,
            defaults=(1,), kw_defaults={'x': 2},
            closure=(load_closure(0),))
    
    The bytecode is:
         0 LOAD_CONST 6 ((1,))
         3 LOAD_CONST 2 (2)
         6 LOAD_CONST 3 ('x')
         9 BUILD_MAP 1
        12 LOAD_CLOSURE 0 (y)
        15 BUILD_TUPLE 1
        18 LOAD_CONST 4 (<code object spam at 0x12345678>)
        21 LOAD_CONST 5 ("spam")
        15 MAKE_CLOSURE 9
    
    Yuck. Notice that we're doing MAKE_CLOSURE rather than MAKE_FUNCTION, because, in addition to passing a name and code, we're also passing a closure (a tuple of cells). And then we're passing 9 as the operand instead of 0. This breaks down into 1 | (1<<8) | (0<<16), meaning 1 positional default, 1 keyword default, and 0 annotations, respectively. And of course we have to push all that stuff on the stack in appropriate format and order.

    If we'd had an annotation on that x parameter, that would change the operand to 1 | (1<<8) | (1<<16), meaning we'd need EXTENDED_ARG, and pushing the annotations is a few more lines, but that's really about as complicated as it ever gets.

    More info

    If you're still reading, you probably want to know where to look for more detail. Besides the inspect and dis docs, there's the C API documentation for Object Protocol ( PyObject_Call and friends) and Function and Code concrete objects, PyCFunction, and maybe Type objects. Then there's the implementations of all those types in the Objects directory in the source. And of course the main interpreter loop in Python/ceval.c.
    2

    View comments

  3. I've seen a number of people ask why, if you can have arbitrary-sized integers that do everything exactly, you can't do the same thing with floats, avoiding all the rounding problems that they keep running into.

    If we compare the two, it should become pretty obvious at some point within the comparison.

    Integers


    Let's start with two integers, a=42 and b=2323, and add them. How many digits do I need? Think about how you add numbers: you line up the columns, and at worst carry one extra column. So, the answer can be as long as the bigger one, plus one more digit for carry. In other words, max(len(a), len(b)) + 1.

    What if I multiply them? Again, think about how you multiply numbers: you line up the smaller number with each column of the larger number, then there's that extra digit of carry again. So, the answer can be as long as the sum of the two lengths, plus one more. In other words, len(a) + len(b) + 1.

    What if I exponentiate them? Here, things get a bit tricky, but there's still a well-defined, easy-to-compute answer if you think about it, asking how many digits are in a**b is just solving for x in 10**(x-1) = a**b and then rounding up. So, log10 both sides and add one, and x = log10(a**b) + 1 = log10(a) * b + 1. Fit in your variables, and it's log10(42) * 2323 ~= 3770.808, which rounds up to 3771. Try len(str(42**2323)) and you'll get 3771.

    You can come up with other fancy operations to apply to integers--factorials, gcds, whatever--but the number of digits required for the answer is always a simple, easy-to-compute function of the number of digits in the operands.*

    * Except when the answer is infinite, of course. In that case, you easily compute that the answer can't be stored in any finite number of digits and use that fact appropriately--raise an exception, return a special infinity value, whatever.

    Decimals


    Now, let's start with two decimals, a=40 and b=.2323, and add them. How many digits do I need? Well, how many digits do the originals have? It kind of depends on how you count. But the naive way of counting says 2 and 4, and the result, 42.2323 has 6 digits. As you'd suspect, len(a) + len(b) + 1 is the answer here.

    What if I multiply them? At first glance, it seems like it should be easy--our example gives us 9.7566, which has 5 digits; multiplying a by itself is the same as integers, and b by itself for 0.05396329 is just adding 4 decimal digits to 4 decimal digits, so it's still len(a) + len(b) + 1.

    What if I exponentiate them? Well, now things get not tricky, but impossible. 42**.2323 is an irrational number. That means it has an infinite number of digits (in binary, or decimal, or any other integer base) to store. (It also takes an infinite amount of time to compute, unless you have an infinite-sized lookup table to help you.) In fact, most fractional powers of most numbers are irrational--2**0.5, the square root of 2, is the most famous irrational number. This means it takes an infinite number of digits to store.

    And it's not just exponentiation; most of the things you want to do with real numbers--take the sine, multiply by pi, etc.--give you irrational answers. Unless you stick to nothing but addition, subtraction, multiplication, and division, you can't have exact math.

    Even if all you want is addition, subtraction, multiplication, and division: a=1, b=3. How many digits do I need to divide them? Start doing some long division: 1 is smaller than 3, so that's 0. 10 has three 3s in it, so that's 0.3, with 1 left over. 10 has three 3s in it, so that's 0.3 with 1 left over. That's obviously going to continue on forever: there is no way to represent 1 / 3 in decimal without an infinite number of digits. Of course you could switch bases. For example, in base 9, 1 / 3 is 0.3. But then you need infinite digits for all kinds of things that are simple in base 10.

    Fractions


    If all you want actually is addition, subtraction, multiplication, and division, you're dealing with fractions, not decimals. Python's fractions.Fraction type does all of these operations with infinite precision. Of course when you go to print our the results as decimals, they may have to get truncated (otherwise, 1/3 or 1/7 would take forever to print), but that's the only limitation.

    Of course if you try to throw exponentiation or sine at a Fraction, or multiply it by a float, you lose that exactness and just end up with a float.

    Aren't decimals just a kind of fraction, where the denominator is 10d, where d is the number of digits after the decimal point? Yes, they are. But as soon as you, say, divide by 3, the decimal result is a fraction with an infinite denominator, as we saw above so that doesn't do you any good. If you need exact rational arithmetic, you need fractions with arbitrary denominators.

    Accuracy


    In real life, very few values are exact in the first place.* Your table isn't exactly 2 meters long, it's 2.00 +/- 0.005 meters.** Doing "exact" math on that 2 isn't going to do you any good. Doing error-propagating math on that 2.00, however, might.

    Also, notice that a bigger number isn't necessarily more accurate than a smaller one (in fact, usually the opposite), but the simple decimal notation means it has more precision: 1300000000 has 10 digits in it, and if we want to let people know that only the first 3 are accurate, we have to write something like 1300000000 +/- 5000000. And even with commas, like 1,300,000,000 +/- 5,000,000, it's still pretty hard to see how many digits are accurate. In words, we solve that by decoupling the precision from the magnitude: 1300 million, plus or minus 5 million, puts most of the magnitude into the word "million", and lets us see the precision reasonably clearly in "1300 +/- 5". Of course at 13 billion plus or minus 5 million it falls down a bit, but it's still better than staring at the decimal representation and counting up commas and zeroes.

    Scientific notation is an even better way of decoupling the precision from the magnitude. 1.3*1010 +/- 5*106 obviously has magnitude around 1010, and precision of 3-4 digits.*** And going to 1.3*1011 +/- 5*106 is just as readable. And floating-point numbers give us the same benefit.

    In fact, when the measurement or rounding error is exactly half a digit, it gets even simpler: just write 1.30*1010, and it's clear that we have 3 digits of precision, and the same for 1.30*1011. And, while the float type doesn't give us this simplification, the decimal.Decimal type does. In addition to being a decimal fraction rather than a binary fraction, so you can think in powers of 10 instead of 2, it also lets you store 1.3e10 and 1.30e10 differently, to directly keep track of how much precision you want to store. It can also give you the most digits you can get out of the operation when possible--so 2*2 is 4, but 2.00*2.00 is 4.0000. That's almost always more than you want (depending on why you were multiplying 2.00 by 2.00, you probably want either 4.0 or 4.00), but you can keep the 4.0000 around as an intermediate value, which guarantees that you aren't adding any further rounding error from intermediate storage. When you perform an operation that doesn't allow that, like 2.00 ** 0.5, you have to work out for yourself how much precision you want to carry around in the intermediate value, which means you need to know how to do error propagation--but if you can work it out, decimal can let you store it.

    * Actually, there are values that can be defined exactly: the counting numbers, e, pi, etc. But notice that most of the ones that aren't integers are irrational, so that doesn't help us here. But look at the symbols section for more...

    ** If you're going to suggest that maybe it's exactly 2.0001790308123812082 meters long: which molecule is the last molecule of the table? How do you account for the fact that even within a solid, molecules move around slowly? And what's the edge of a molecule? And, given that molecules' arms vibrate, the edge at what point in time? And how do you even pick a specific time that's exactly the same across the entire table, when relativity makes that impossible? And, even if you could pick a specific molecule at a specific time, its edge is a fuzzy cloud of electron position potential that fades out to 0.

    *** The powers are 10 and 6, so it's at worst off by 4 digits. But the smaller one has a 5, while the bigger one has a 1, so it's obviously a lot less than 4 digits. To work out exactly how many digits it's off, do the logarithm-and-round trick again.

    Money


    Some values inherently have a precision cutoff. For example, with money, you can't have less than one cent.* In other words, they're fixed-point, rather than floating-point, values.

    The decimal module can handle these for you as well. In fact, money is a major reason there's a decimal standard, and implementations of that standard in many languages' standard libraries.***

    * Yes, American gas stations give prices in tenths-of-a-cent per gallon, and banks transact money in fractional cents, but unless you want to end up in Superman 3,** you can ignore that.

    ** And yes, I referenced Superman 3 instead of Office Space. If you're working in software in 2015 and haven't seen Office Space, I don't know what I could say that can help.

    *** For some reason, people are willing to invest money in solving problems that help deal with money.

    Symbols


    So, how do mathematicians deal with all of this in real life? They don't. They do math symbolically, rather than numerically. The square root of 2 is just the square root of 2. And you carry it around that way throughout the entire operation. Multiply 3 * sqrt(2) and the answer is 3 * sqrt(2). But multiply sqrt(2) * sqrt(2) and you get 2, and multiply sqrt(2) * sqrt(3) and you get sqrt(6), and so on. There are simplification rules that give you exact equivalents, and you can apply these as you go along, and/or at the end, to try to get things as simple as possible. But in the end, the answer may end up being irrational, and you're just stuck with sqrt(6).

    Sometimes you need a rough idea of how big that sqrt(6) is. When that happens, you know how rough you want it, and you can calculate it to that precision. To three digits, more than enough to give you a sense of scale, it's 2.45. If you need a pixel-precise graph, you can calculate it to +/- half a pixel. But the actual answer is sqrt(6), and that's what you're going to keep around (and use for further calculation).

    In fact, let's think about that graph in more detail. For something simple and concrete, let's say you're graphing radii vs. circumferences of circles, measured in inches, on a 1:1 scale, to display on a 96 pixels-per-inch screen. So, a circle with radius 3" has a diameter of 18.850" +/- half a pixel. Or, if you prefer, 1810 pixels. But now, let's say your graph is interactive, and the user can zoom in on it. If you just scale that 1810 pixels up at 10:1, you get 18100 pixels. But if you stored 6*pi and recalculate it at the new zoom level, you get 18096 pixels. A difference of 4 pixels may not sound like much, but it's enough to make things look blocky and jagged. Zoom in too much more, and you're looking at the equivalent of face-censored video from Cops.

    Python doesn't have anything built-in for symbolic math, but there are some great third-party libraries like SymPy that you can use.
    2

    View comments

  4. In a recent thread on python-ideas, Stephan Sahm suggested, in effect, changing the method resolution order (MRO) from C3-linearization to a simple depth-first search a la old-school Python or C++. I don't think he realized that's what he was proposing at first, but the key idea is that he wants Mixin to override A in B as well as overriding object in A in this code:

    class Mixin: pass
    class A(Mixin): pass
    class B(Mixin, A): pass
    

    In other words, the MRO should be B-Mixin-A-Mixin-object. (Why not B-Mixin-object-A-Mixin-object? I think he just didn't think that through, but let's not worry about that.) After all, why would he put Mixin before A if he didn't want it to override A in B? And why would he attach Mixin to A if he didn't want it to override object in A?

    Well, that doesn't actually work. The whole point of linearization is that each class appears only once in the MRO, and many feature of Python--including super, which he wanted to make extensive use of--depend on that. For example, with his MRO, inside Mixin.spam, super().spam() is going to call A.spam, and its super().spam() is going to call Mixin.spam(), and you've obviously got a RecursionError on your hands.

    I think ultimately, his problem is that what he wants isn't really a mixin class (in typical Python terms--in general OO programming terms, it's one of the most overloaded words you can find...). For example, a wrapper class factory could do exactly what he wants, like this:

    def Wrapper(cls): return cls
    class A(Wrapper(object)): pass
    class B(Wrapper(A)): pass
    

    And there are other ways to get where he wants.

    Anyway, changing the default MRO in Python this way is a non-starter. But if he wants to make that change manually, how hard is it? And could he build a function that lets his classes could cooperate using that function instead of super?

    Customizing MRO


    The first step is to build the custom MRO. This is pretty easy. He wants a depth-first search of all bases, so:

    [cls] + list(itertools.chain.from_iterable(base.__mro__ for base in cls.__bases__))
    

    Or, if leaving the extra object out was intentional, that's almost as easy:

    [cls] + list(itertools.chain.from_iterable(base.__mro__[:-1] for base in cls.__bases__)) + [object]
    

    But now, how do we get that into the class's __mro__ attribute?

    It's a read-only property; you can't just set it. And, even if you could, you need type.__new__ to actually return something for you to modify--but if you give it a non-linearizable inheritance tree, it'll raise an exception. And finally, even if you could get it set the way you want, every time __bases__ is changed, __mro__ is automatically rebuilt.

    So, we need to hook the way type builds __mro__.

    I'm not sure if this is anywhere in the reference documentation or not, but the answer is pretty easy: the way type builds __mro__ is by calling its mro method. This is treated as a special method (that part definitely isn't documented anywhere), meaning it's looked up on the class (that is, the metaclass of the class being built) rather than the instance (the class being built), doesn't go through __getattribute__, etc., so we have to build a metaclass.

    But once you know that, it's all trivial:

    class MyMRO(type): 
        def mro(cls): 
            return ([cls] + 
                    list(itertools.chain.from_iterable(base.__mro__[1:] for base in cls.__bases__)) +
                    [object])
    
    class Mixin(metaclass=MyMRO): pass
    class A(Mixin): pass
    class B(Mixin, A): pass
    

    And now, B.__mro__ is B-Mixin-A-Mixin-object, exactly as desired.

    For normal method calls, this does what the OP wanted: Mixin gets to override A.

    But, as mentioned earlier, it obviously won't enable the kind of super he wants, and there's no way it could. So, we'll have to build our own replacement.

    Bizarro Super


    If you want to learn how super works, I think the documentation in Invoking Descriptors is complete, but maybe a little terse to serve as a tutorial. I know there's a great tutorial out there, but I don't have a link, so... google it.

    Anyway, how super works isn't important; what's important is the define what we want here. Once we actually know exactly what we want, anything is possible as long as you believe, that's what science is all about.

    Since we're defining something very different from super but still sort of similar, the obvious name is bizarro.

    Now, we want a call to bizarro().spam() inside B.spam to call Mixin.spam, a call inside Mixin.spam to call A.spam, a call inside A.spam to call Mixin.spam, and a call inside Mixin.spam to call object.spam.

    The first problem is that calling object.spam is just going to raise an AttributeError. Multiple inheritance uses of super are all about cooperative class hierarchies, and part of that cooperation is usually that the root of your tree knows not to call super. But here, Mixin is the root of our tree, but it also appears in other places on the tree, so that isn't going to work.

    Well, since we're designing our own super replacement, there's no reason it can't also cooperate with the classes, instead of trying to be fully general. Just make it return a do-nothing function if the next class is object, or if the next class doesn't have the method, or if the next class has a different metaclass, etc. Pick whatever rule makes sense for your specific use case. Since I don't have a specific use case, and don't know what the OP's was (he wanted to create a "Liftable" mixin that helps convert instances of a base class into instances of a subclass, but he didn't explain how he wanted all of the edge cases to work, and didn't explain enough about why he wanted such a thing for me to try to guess on his behalf), I'll go with the "doesn't have the method".

    While we're at it, we can skip over any non-cooperating classes that end up in the MRO (which would obviously be important if we didn't block object from appearing multiple times--but even with the MRO rule above, you'll have the same problem if your root doesn't directly inherit object).

    The next problem--the one that's at the root of everything we're trying to work around here--is that we want two different things to happen "inside Mixin.spam", depending on whether it's the first time we're inside or the second. How are we going to do that?

    Well, obviously, we need to keep track of whether it's the first time or the second time. One obvious way is to keep track of the index, so it's not A.spam if we're in Mixin.spam or object.spam if we're in Mixin.spam, it's B.__mro__[2] if we're in B.__mro__[1], and B.__mro__[4] if we're in B.__mro__[3]. (After first coding this up, I realized that an iterator might be a lot nicer than an index, so if you actually need to implement this yourself, try it that way. But I don't want to change everything right now.)

    How can we keep track of anything? Make the classes cooperate. Part of the protocol for calling bizarro is that you take a bizarro_index parameter and pass it into the bizarro call. Let's make it as an optional parameter with a default value of 0, so your users don't have to worry about it, and make it keyword-only, so it doesn't interfere with *args or anything. So:

    class Mixin(metaclass=MyMRO):
        def doit(self, *, bizarro_index=0):
            print('Mixin')
            bizarro(Mixin, self, bizarro_index).doit()
    
    class A(Mixin):
        def doit(self, *, bizarro_index=0):
            print('A')
            bizarro(A, self, bizarro_index).doit()
    
    class B(Mixin, A):
        def doit(self, *, bizarro_index=0):
            print('B')
            bizarro(B, self, bizarro_index).doit()
    

    And now, we just have to write bizarro.

    The key to writing something like super is that it returns a proxy object whose __getattribute__ looks in the next class on the MRO. If you found that nice tutorial on how super works, you can start with the code from there. We then have to make some changes:

    1. The way we pick the next class has to be based on the index.
    2. Our proxy has to wrap the function up to pass the index along.
    3. Whatever logic we wanted for dealing with non-cooperating classes has to go in there somewhere.

    Nothing particularly hard. So:

    def bizarro(cls, inst, idx): 
        class proxy: 
            def __getattribute__(self, name): 
                for superidx, supercls in enumerate(type(inst).__mro__[idx+1:], idx+1): 
                    try:
                        method = getattr(supercls, name).__get__(inst) 
                    except AttributeError: 
                        continue 
                    if not callable(method):
                        return method # so bizarro works for data attributes
                    @functools.wraps(method) 
                    def wrapper(*args, **kw):
                        return method(*args, bizarro_index=superidx, **kw)
                    return wrapper 
                return lambda *args, **kw: None 
        return proxy() 
    

    And now, we're done.

    Bizarro am very beautiful


    In Python 3, super(Mixin, self) was turned into super(). This uses a bit of magic, and we can use the same magic here.

    Every method has a cell named __class__ that tells you which class it's defined in. And every method takes its self as the first parameter. So, if we just peek into the caller's frame, we can get those easily. And while we're peeking into the frames, since we know the index has to be the bizarro_index parameter to any function that's going to participate in bizarro super-ing, we can grab that too:

    def bizarro():
        f = sys._getframe(1)
        cls = f.f_locals['__class__']
        inst = f.f_locals[f.f_code.co_varnames[0]]
        idx = f.f_locals['bizarro_index']
        # everything else is the same as above
    

    This is cheating a bit; if you read PEP 3135, the super function doesn't actually peek into the parent frame; instead, the parser recognizes calls to super() and changes them to pass the two values. I'm not sure that's actually less hacky--but it is certainly more portable, because other Python implementations aren't required to provide CPython-style frames and code objects. Also leaving the magic up to the parser means that, e.g., PyPy can still apply its no-frames-unless-needed optimization, trading a tiny bit of compile-time work for a small savings in every call.

    If you want to do the same here, you can write an import hook that AST-transforms bizarro calls in the same way. But I'm going to stick with the frame hack.

    Either way, now you can write this:

    class Mixin(metaclass=MyMRO):
        def doit(self, *, bizarro_index=0):
            print('Mixin')
            bizarro().doit()
    
    class A(Mixin):
        def doit(self, *, bizarro_index=0):
            print('A')
            bizarro().doit()
    
    class B(Mixin, A):
        def doit(self, *, bizarro_index=0):
            print('B')
            bizarro().doit()
    

    Meanwhile, notice that we don't actually use cls anywhere anyway, so... half a hack is only 90% as bad, right?

    But still, that bizarro_index=0 bit. All that typing. All that reading. There's gotta be a better way.

    Well, now you can!

    We've already got a metaclass. We're already peeking under the covers. We're already wrapping functions. So, let's have our metaclass peek under the covers of all of our functions and automatically wrap anything that uses bizarro to take that bizarro_index parameter. The only problem is that the value will now be in the calling frame's parent frame (that is, the wrapper), but that's easy to fix too: just look in f_back.f_locals instead of f_locals.

    import functools
    import itertools
    import sys
    
    class BizarroMeta(type):
        def mro(cls):
            return ([cls] +
                    list(itertools.chain.from_iterable(base.__mro__[:-1] for base in cls.__bases__)) +
                    [object])
        def __new__(mcls, name, bases, attrs):
            def _fix(attr):
                if callable(attr) and 'bizarro' in attr.__code__.co_names:
                    @functools.wraps(attr)
                    def wrapper(*args, bizarro_index=0, **kw):
                        return attr(*args, **kw)
                    return wrapper
                return attr
            attrs = {k: _fix(v) for k, v in attrs.items()}
            return super().__new__(mcls, name, bases, attrs)
    
    def bizarro():
        f = sys._getframe(1)
        inst = f.f_locals[f.f_code.co_varnames[0]]
        idx = f.f_back.f_locals['bizarro_index']
        class proxy: 
            def __getattribute__(self, name): 
                for superidx, supercls in enumerate(type(inst).__mro__[idx+1:], idx+1):
                    try:
                        method = getattr(supercls, name).__get__(inst)
                    except AttributeError: 
                        continue 
                    if not callable(method):
                        return method # so bizarro works for data attributes
                    @functools.wraps(method) 
                    def wrapper(*args, **kw): 
                        return method(*args, bizarro_index=superidx, **kw)
                    return wrapper 
                return lambda *args, **kw: None 
        return proxy() 
    
    class Mixin(metaclass=BizarroMeta):
        def doit(self):
            print('Mixin')
            bizarro().doit()
    
    class A(Mixin):
        def doit(self):
            print('A')
            bizarro().doit()
    
    class B(Mixin, A):
        def doit(self):
            print('B')
            bizarro().doit()
    
    B().doit()
    

    Run this, and it'll print B, then Mixin, then A, then Mixin.

    Unless I made a minor typo somewhere, in which case it'll probably crash in some way you can't possibly debug. So you'll probably want to add a bit of error handling in various places. For example, it's perfectly legal for something to be callable but not have a __code__ member--a class, a C extension function, an instance of a class with a custom __call__ method... Whether you want to warn that you can't tell whether Spam.eggs uses bizarro or not because you can't find the code, assume it doesn't and skip it, assume it does and raise a readable exception, or something else, I don't know, but you probably don't want to raise an exception saying that type objects don't have __code__ attributes, or whatever comes out of this mess by default.

    Anyway, the implementation is pretty it's pretty small, and not that complicated once you understand all the things we're dealing with, and the API for using it is about as nice as you could want.

    I still don't know why you'd ever want to do this, but if you do, go for it.
    1

    View comments

  5. Note: This post doesn't talk about Python that much, except as a point of comparison for JavaScript.

    Most object-oriented languages out there, including Python, are class-based. But JavaScript is instead prototype-based. Over the years, this has led to a lot of confusion, and even more attempts to resolve that confusion, either by faking classes, or by explaining why prototypes are good. Unfortunately, most of the latter attempts are completely wrong, and only add to the confusion.

    A mess?

    I'm going to pick on Kyle Simpson's JS Objects: Inherited a Mess (and its two followups)—not because it's particularly bad, but for the opposite reason: it's one of the best-written attempts at clearing things up (and it's frequently linked to).

    Simpson's main point is that, ultimately, prototype inheritance is more like delegation than like inheritance, because the inheritance "arrows" go in the opposite direction.

    Let's take a language like C++, and write a class Foo, a subclass Bar, and a Bar instance bar1. There's a superclass-to-subclass arrow pointing from Foo to Bar, which represents Bar's vtable being made by copying Foo's, and a class-to-instance arrow pointing from Bar to bar1, representing bar1's vptr being copied from Bar.

    In JavaScript, on the other hand, there's no such copying; instead, there are live links. Methods are looked up on the object; if not found, the link is followed to look up the method on its prototype, and so on up the chain. So, he draws an arrow from bar1 to Bar.prototype, which its prototype link, and an arrow from Bar.prototype to Foo.prototype, representing the same thing.

    But he's drawing entirely the wrong distinction. When you notice that C++, Java, and C# all do things one way, but JavaScript does things differently, it's a decent guess that the difference is prototypes vs. classes. But as soon as you toss in a few other languages, it's obvious that the guess is wrong. For example, Python and Smalltalk do things the same way as JavaScript, but they're very definitely class-based languages.

    Simpson brings up some other differences between JS and other languages in the rest of the series--e.g., the fact that a JS object's constructor doesn't automatically call its prototype's constructor, while a C++ class's constructor does automatically call its base classes' constructors. But here again, Python does the same thing as JS. At any rate, the direction of the arrows is what he focuses on, so let's stick with that.

    Dynamic vs. semi-dynamic lookup

    If this difference isn't about prototypes vs. classes, what is it?

    In Python, method lookup is dynamic. If your source code has a call to foo.spam, the compiler turns that into code that does the following, until one of them works:*
    • Ask the object to handle "spam".
    • Ask the object's __class__ to handle "spam".
    • Walk the class's superclass tree (in C3-linearized order), asking each one in turn to handle "spam".
    • Raise an AttributeError.
    In JavaScript, method lookup is dynamic in exactly the same way. The compiler turns a call to foo.spam into this:
    • Ask the object to handle "spam".
    • Walk the object's prototype list, asking each one in turn to handle "spam".
    • Raise a TypeError.
    The only difference is that Python has two kinds of links--instance-to-class and class-to-superclass--that JavaScript collapses into one--object-to-prototype. (And that Python has a linearized multiple inheritance tree, but let's ignore that here.)

    In C++, method lookup is mostly static, with a hook for a very specific kind of dynamism. The compiler turns a call to foo.spam into this:**
    • Look up foo.vptr[2] and call it.
    The compiler has to know that foo is (a reference to) an instance of Foo or some subclass thereof, and it has to have compiled Foo already so that it knows that spam is its third method. The only way anyone outside of Foo can affect what happens here is that a subclass like Bar can override the spam method, so its vtable ends up with a different function pointer in the third slot. If so, if foo is a Bar instance at runtime, then foo.vptr[2] will point to Bar.spam instead of Foo.spam. That's all the late binding you get.

    The downsides to C++-style virtual methods instead of fully dynamic lookup are pretty obvious. In JavaScript or Python, you can set a new foo.spam at any time, overriding the behavior of Foo. Or you can set a new Foo.spam, which will affect all existing instances of Foo or its descendants that don't have any overriding implementation. This can be pretty handy for, e.g., building up transparent proxies based on runtime information, or mocking objects for unit tests.

    But you don't do those things that often. Pretty often, C++'s dynamic behavior is all the dynamic behavior you need. And it's obviously simpler, and a lot faster, to just follow a pointer, index an array, and follow another pointer than to walk through a linked chain of objects asking each one to do a dictionary lookup for you. And it even opens the door to further optimizations--e.g., if the compiler knows that foo must be exactly a Foo rather than a subclass (e.g., because it's a local allocated on the stack), or it knows that no subclass of Foo overrides the method (e.g., because it's got feedback from a whole-program analyzer), it can just insert a call directly to the Foo.spam implementation.

    The problem is that something that works "pretty often" but has no escape hatch only works pretty often. And just as often, the speed of the lookups don't matter anyway. As Alan Kay pointed out back in the 70s, usually you're either doing all your work inside some inner loop so it hardly matters how fast you get to that loop, or spending all your time waiting on a user to interact with the GUI/a client to interact with the reactor/etc. so it hardly matters how fast you do anything. Better to give people features they might need than performance they probably don't. So, that's what Smalltalk did. And Objective-C, Python, Ruby, etc.

    * Python offers a wide variety of ways to hook this process, as do many other dynamic OO languages, so this is only the default behavior. But let's not worry about that right now.
    ** I'm ignoring non-virtual functions here, but they don't add any complexity. I'm also ignoring the screwy way that C++ does multiple inheritance. Be glad. I'm also hiding reference/pointer stuff, which means this is closer to C# or Java than C++, but that doesn't really matter here.

    The best of both worlds?

    A Smalltalk offshoot called Self was built around the idea that a just-in-time (JIT) optimizer can eliminate most of the costs of dynamic lookup, while preserving all the benefits. It's pretty uncommon that you change out a method at runtime, and very rare that you do so more than once or twice in the run of a program. So, if you cache the lookup and recompile any function that needs the lookup to use the cached value, and insert some guards to invalidate the cache on the rare occasions when you change something, you may end up with millions or billions of faster-than-C++ lookups for every one recompile and slow lookup.

    It's not coincidental that Self is also the language that popularized prototype-based inheritance. One major thing classes add in dynamic inheritance is that making that first link special makes it easier to optimize. Also, classes of objects tend to almost always override the same set of methods in the same way, so you might as well label those classes to help the optimizer. And if the language encourages you to think in terms of classes of objects, you'll usually make the optimizer's job easier. But with a smart-enough JIT, that isn't necessary. At any rate, the dynamic lookup was the main reason for the JIT, and that reason applies to dynamic class-based languages just as much as to prototype-based ones.

    Plus, of course, it turns out that the idea of a JIT worked pretty well on static code too; Hotshot running compiled Java bytecode is generally faster than PyPy or V8 running Python or JavaScript. But for most uses, PyPy and V8 are more than fast enough. (In fact, for many uses, CPython and Spidermonkey are more than fast enough.)

    What's in a name?

    Simpson wants to insist that JavaScript's behavior is not dynamic inheritance, it's delegation.

    But, if so, Python, Smalltalk, and many other class-based languages aren't doing dynamic inheritance either. So, the difference between class-based and prototype-based still has nothing to do with delegation vs. inheritance, even if you use his terms.

    In fact, what he's calling "delegation" really just means dynamic inheritance. Which makes it obvious why claiming that JavaScript doesn't do dynamic inheritance, it does delegation, is wrong.

    While we're on the subject of names, you could argue that the confusion about prototypes is from the start caused by other things JS got wrong. Other prototype-based languages, from Self to newer languages like IO, don't use constructor functions and a Java-like new operator, so they don't make prototyping/cloning look misleadingly like instantiation. Maybe if JS didn't do that, people wouldn't have been misled as badly in the first place?

    Prototypes

    So then, what is the difference between classes and prototypes?

    Of course there's the minor difference noted above, that JavaScript only has one kind of link instead of two.* And, if you read the footnotes, there could be a performance difference. But... so what?

    Well, getting rid of classes does lose something: In JavaScript, your prototype chain doesn't affect your type. In Python, because the first step of that chain is special, the language can define its notion of type in terms of that link. Of course 95% of your code is duck-typed and EAFP-driven and this doesn't matter. But for that last 5% (and for debugging), you have a coherent and useful notion of type, which can help.

    Also, making multiple inheritance work with classes is not that hard (even though C++ gets it wrong, Java punts on it, etc.); making it work with prototypes is harder.

    The biggest downside is that cooperative inheritance through a super-like mechanism is hard with prototypes. In particular, composing mixins that don't know about each other is a serious problem. Simpson gets into this in his "Polymorphism redux" section in his third part. But I believe this is actually only a problem if you use the constructor idiom instead of using prototype-chained object literals, so a prototype-based language that didn't encourage the former and make the latter more verbose than necessary may not have this problem.

    Anyway, what do you gain in return for these negatives? One thing (although not the motivation behind Self**) is flexible object literals. In JavaScript, creating an object is just like creating a dictionary. If some of the members are callable, they're methods. Sure, this is just syntactic sugar, but it's syntactic sugar around something you often want.

    At first glance, that sounds horrible. We have 30000 objects, all anonymous, all locally created in different calls to the same function and therefore closing over different scopes… so much duplication! But again, you're thinking about ahead-of-time optimization; there's no reason the interpreter can't realize that all of those closures have the same body, and guardably-almost-always the same behavior, so with a JIT, you're wasting very little space and even less time. And, even if that weren't true, it probably wouldn't matter anyway. Meanwhile, if you don't want that duplication, you can always move some of those functions into a prototype—exactly the same way as you'd move them into a class in Python.

    At second glance, it sounds horrible again. Everyone knows that closures and objects are dual—one is the functional way of encapsulating state, the other is the OO way. So, who wants closures inside objects? But here... well, a Python developer shouldn't be surprised that sometimes practicality beats purity, and adding boilerplate can hurt readability. For a locally-created object that just has a bunch of simple functions that all refer to the local variable username, does it actually make things more readable to create and call a constructor that copies username to self.username and make all those functions call self.username, or is that just boilerplate?

    (In fact, you can go farther and explicitly link up multiple closures through manual delegation chains. But I haven't seen many uses of inheritance chains beyond 2-deep in JavaScript, and nearly all of those that I've seen are equivalent to what you'd do in a class-based language. So, let's just stick with the simple case here.)

    Really, it's similar to def vs. lambda: for most functions, def is clearly better, but that doesn't mean there's never a good use for lambda. Nobody needs a separate statement, and an explicit name, for the function lambda x: x+1 or lambda cb: cb(result). And similarly, for most objects, using methods in the class or prototype to encapsulate the object's state is clearly better, but that doesn't mean there's never a good use for object literals with object-specific methods.

    The big difference is that in JavaScript, both ways are easy; in Python, both ways are possible, but only one is easy—there's no clean and simple way to put the behavior in each object.

    Well, that's not quite true; JS's horrible scoping rules and its quirky this mean both ways are actually not easy. But that's not because JS is prototype-based; it's because JS got some other things wrong.

    Anyway, the differences in this section are what really matter—and they're the ones that are down to JS being prototype-based.

    * Or, if you count metaclass-class as different from class-instance, one instead of three. Simulating metaclasses in a prototype language usually means making use of the distinction between calling the prototype vs. just referencing its .prototype, in JS terms, and there's nothing to protect you from mixing up your "metaprototypes" and prototypes. But then Python has nothing to protect you from inheriting from a metaclass as if it were a normal class either, and that never causes problems in real-life code.
    ** The inspiration behind prototypes in Self was that it's often hard to define a class hierarchy in advance, and too hard to modify a class hierarchy after you've already written a bunch of code around it. Slightly oversimplifying, prototypes were mainly an attempt to solve the fragile base class problem.

    The best of both... wait, deja-vu

    Some newer class-based languages, like Scala, have an object-literal and/or singleton-defining syntax. (And it's not that hard to add that to Python, if you want.) If you combine this with a few other features, you can get objects that act like instances and clones at the same time (and hopefully in the ways you want, not the ways you don't).

    On the other hand, JS (ES6) is adding classes for when you want them, but object literals will still an anonymously-typed clone of an optionally-specific prototype rather than an instance of a class.

    Are we meeting in the middle? I'm not sure.
    1

    View comments

  6. About a year and a half ago, I wrote a blog post on the idea of adding pattern matching to Python.

    I finally got around to playing with Scala semi-seriously, and I realized that they pretty much solved the same problem, in a pretty similar way to my straw man proposal, and it works great. Which makes me think that it's time to revisit this idea. For one thing, using Scala terminology that people may be familiar with (and can look up) can't hurt. For another, we can see whether any of the differences between Python and Scala will make the solution less workable for Python.

    Extractors


    In my post, I described the solution as a matching protocol. A class that wants to be decomposable by pattern matching has to implement a method __match__, which returns something like a tuple of values, or maybe an argspec of values and optional values, that the pattern-matching code can then match against and/or bind to the components of a pattern.

    For simple cases, this could be automated. Both namedtuple and Namespace could define a __match__, and we could write decorators to examine the class in different ways--__slots__, propertys, copy/pickle, or even runtime introspection--and build the __match__ for you. But when none of these simple cases apply, you have to implement it yourself.

    In Scala, the equivalent is the extractor protocol. A class that wants to be decomposable by pattern matching supplies an associated extractor (normally a singleton object, but that isn't important here), which has an unapply method that returns an optional value or tuple of values, as well as an optional apply method that constructs instances of the class from the same arguments. A case class (a simple kind of type that's basically just a record) automatically gets an extractor built for it; otherwise, you have to build it yourself.

    In the simplest case, this is no different from just using __match__ for unapply and, of course, the existing __init__ for apply. But it's actually a lot more flexible.

    • You can write multiple extractors that all work on the same type.
    • You can add an extractor for an existing type, without having to monkeypatch the type. For example, you could extract the integer and fraction parts of a float.
    • You can write an extractor that doesn't match all values of the type.

    Example


    Daniel Westheide's great Neophyte's Guide to Scala shows some of this in action. I'll borrow one of his examples, with minor variations:

    trait User {
      def name: String
      def score: Int
    }
    class FreeUser(val name: String, val score: Int, val upgradeProbability: Double)
      extends User
    class PremiumUser(val name: String, val score: Int) extends User
    
    object FreeUser {
      def unapply(user: FreeUser): Option[(String, Int, Double)] =
        Some((user.name, user.score, user.upgradeProbability))
    }
    object PremiumUser {
      def unapply(user: PremiumUser): Option[(String, Int)] = Some((user.name, user.score))
    }
    
    val user: User = new FreeUser("Daniel", 3000, 0.7d)
    
    user match {
      case FreeUser(name, _, p) =>
        if (p > 0.75) name + ", what can we do for you today?" else "Hello " + name
      case PremiumUser(name, _) => "Welcome back, dear " + name
    }
    

    If you're not used to Scala's syntax, this might be a bit hard to read, but it's not too hard to muddle through. A trait is like an ABC, a class is like a normal class, and an object defines a singleton class and the one instance of it. So:

    First, we define a class hierarchy: FreeUser and PremiumUser are both subclasses of the abstract base class User, and FreeUser adds a new attribute.

    Next, we define two extractors. The fact that they're named FreeUser and PremiumUser, just like the classes, is convenient for readability, but it's the type annotations on their unapply methods that makes them actually work on those types respectively.

    Then we construct a User, who happens to be a free user with an 0.7 probability of upgrading.

    Then we pattern-match that user, first as a free user, then as a premium user if that fails. Here, the syntax works like my proposal, except for minor spelling differences (match for case, case for of, braces for indents, => for :). But, instead of checking isinstance(user, FreeUser) and then calling instance.__match__ and binding to name, _, p, Scala tries to call FreeUser.unapply(user), which works, and binds to name, _, p. The result is the same, but the way it gets there is a little more flexible (as we'll see).

    And then, inside the free user case, we just do an if-else, so the result is "Hello Daniel".

    Another example


    In case it isn't obvious what the optional-value bit is for, here's another example that makes use of that, as well as applying an extractor to a builtin type without having to monkeypatch it. This is a pretty silly example, but it's in the official Scala tutorial, so...

    object Twice {
        def unapply(x: Int): Option[Int] = if (x%2 == 0) Some(x/2) else None
    }
    
    val i: int = 42
    i match {
      case Twice(n) => n
      _ => -1
    }
    

    Here, Twice.unapply(42) returns Some(21), so the case Twice(n) will match, binding n to 21, which we'd return.

    But if we tried the same thing with 23, then Twice.unapply(23) returns None, so the case Twice(n) would not match, so we'd hit the default case, so we'd return -1.

    As for the optional values, I'll get to that below.

    Can we do this in Python?


    Some of the details obviously aren't going to fit into Python as-is.

    We could easily add a @singleton class decorator that lets us make the class anonymous and return a singleton instance pretty easily. But we still can't use the same name as the class without a conflict in Python (the instance would just unbind the type). And really, do we need to define a class here just to wrap up a function in a method (which is, or might as well be, a @staticmethod, since it doesn't touch self, which has no state to touch anyway)? I think it would be more pythonic to just have decorated functions. (If you disagree, it's pretty trivial to change.)

    Python normally uses dynamic (per-instance) attributes on classes. You can use __slots__ or @property, and you can even define an ABC that makes those abstract, matching Scala's traits, but I think it's more Pythonic to just skip that here. (I'm pretty sure it's orthogonal to everything related to pattern matching and extracting; it just means our most Pythonic equivalent example isn't exactly like the Scala example.)

    Python doesn't have optional types. It does have "implicit nullable types" by just returning None, but since None is often a valid value, that doesn't do any good. Of course the answer here is exceptions; everywhere that Scala returns Option[T], Python returns T or raises, so why should this be any different?

    In my previous proposal, we used AttributeError (or a new subclass of it, MatchError) to signal a failed match (partly because you get that automatically if __match__ doesn't exist), so let's stay consistent here. We can even make a special unapply_simple decorator that lets us return None and have it turned into an MatchError when we know it isn't a valid value.

    Finally, Python doesn't have static typing. Is that necessary here? Let's try it and see.

    So, using my earlier case syntax:

    class User:
        pass
    
    class FreeUser(User):
        def __init__(self, name: str, score: int, upgrade_probability: float):
            self.name, self.score, self.upgrade_probability = name, score, upgrade_probability
    
    class PremiumUser(User):
        def __init__(self, name: str, score: int):
            self.name, self.score = name, score
    
    @unapply
    def free_user(user):
        return user.name, user.score, user.upgrade_probability
    
    @unapply
    def premium_user(user):
        return user.name, user.score
    
    user = FreeUser("Daniel", 3000, 0.7d)
    
    case user:
        of free_user(name, _, p):
            if p > 0.75:
                return '{}, what can we do for you today?'.format(name)
            else:
                return 'Hello {}'.format(name)
        of premium_user(name, _):
            return 'Welcome back, dear {}'.format(name)
    
    @unapply_simple
    def twice(x):
        if x%2 == 0: return (x/2,)
    
    i = 42
    case i:
        of twice(n): print(n)
        else: print(-1)
    

    There are a few problems here, but I think they're easily solvable.

    My earlier proposal matched Breakfast(0, eggs, _) by first checking isinstance(breakfast, Breakfast), and then calling Breakfast.__match__; here, there's no type-checking at all.

    As it turns out, that free_user will fail to match a PremiumUser because user.upgrade_probability will raise an AttributeError--but the other way around would match just fine. And we don't want it to. But with nothing but duck typing, there's no way that Python can know that premium_user shouldn't match a FreeUser. (The only reason a human knows that is that it's implied by the names.)

    So, is the lack of static typing a problem here?

    Well, we can do the exact same isinstance check inside the unapply decorator; just use @unapply(FreeUser) instead of just @unapply. This seems perfectly reasonable. And it's no more verbose than what you have to do in Scala.

    Or we could even have @unapply check for annotations and test them. It's even closer to what you do in Scala (and in most other FP/hybrid languages), and, if we're going to encourage type annotations in general, why not here? On the downside, PEP 484 specifically says that its annotations don't affect anything at runtime. Also, not every PEP 484/mypy static annotation will work dynamically (e.g., Iterable[int] can only check for Iterable). But I don't think that's a problem. The decorator version seems more conservative, but either one seems reasonably Pythonic if we wanted to go with it.

    Can the two proposals be merged?


    It's pretty nice that the earlier proposal uses the type name as the pattern name, especially for simple types like namedtuple.

    But it's also pretty nice that the Scala-influenced proposal allows you to have multiple pattern extractors for the same type.

    Scala gets the best of both worlds by letting you give one of the extractors the same name as the type (and by giving you one automatically for case classes).

    We could get the best of both worlds by just implementing both. The relevant bit of case-statement logic from my previous proposal did this to match of Breakfast(0, _, eggs)::

    if isinstance(case_value, Breakfast):
        try:
            _0, _, eggs = case_value.__match__()
            if _0 != 0: raise MatchError(_0, 0)
        except AttributeError:
            pass # fall through to next case
        else:
            case body here
    

    To extend this to handle extractors:

    if isinstance(case_value, Breakfast):
        # ... code above
    elif isinstance(Breakfast, unapply_function):
        try:
            _0, _, eggs = Breakfast(case_value)
        # ... rest of the code is the same as above
    

    We need some way to avoid calling Breakfast when it's a class (again, consider the FileIO problem from my previous post). That's what the second isinstance check is for. That unapply_function type is what the @unapply decorator (and any related decorators) returns. (It's a shame that it can't inherit from types.FunctionType, because, as of Python 3.6, that's still not acceptable as a base type. But it can implement __call__ and otherwise duck-type as a function. Or, alternatively, just use a function and attach a special attribute to it in the decorator.)

    In fact, we can even allow externally defining decomposition without monkeypatching just by having a special decorator that registers an extractor as the type-named extractor for a type, maybe @unapply_default(Spam). That could create Spam.__match__ as a wrapper around the function, but it could just as easily store it in a registry that we look up if isinstance(case_value, Spam) but Spam.__match__ doesn't exist. (The latter allows us to avoid monkeypatching existing classes--which is obviously important for classes like int that can't be patched... but then do you want coders to be able to create a new pattern named int? An extractor that can be applied to int, sure, but one actually called int?)

    Other features of Scala pattern matching


    Features already covered


    There are some Scala features I didn't mention here because they're equivalent to features I already covered in my previous post:
    • Scala has as clauses with the same semantics. (they're spelled NEWNAME @ PATTERN instead of PATTERN as NEWNAME, which I think is less readable).
    • Scala has if guard clauses with the same semantics (including being able to use values bound by the decomposition and the as clause), and almost the same syntax.
    • Scala has value definition and variable definition pattern syntax, with the same semantics and nearly the same syntax as my pattern-matching assignment (not surprising, since their syntax was inspired by Python's sequence unpacking assignment, and mine is an extension of the same).

    Pattern matching expressions


    Scala has a match expression, rather than a statement.

    That shouldn't be too surprising. As I mentioned last time, most languages that make heavy use of pattern matching use an expression. But that's because they're expression-based languages, rather than statement/expression-divide languages. (Traditionally, functional languages are expression-based either because they encourage pure-functional style, or because they're heavily based on Lisp. Scala isn't being traditional, but it's expression-based following the same trend as JavaScript and Ruby.)

    In Python (unlike traditional functional languages, Scala, or Ruby), if is a statement--but then Python also has a more limited form of if that can be used in expressions. Could the same idea apply here? Maybe. But consider that Python had an if statement for many years before the stripped-down expression was added, and the statement is still used far more often.

    I think we'll want pattern-matching statements for anything non-trivial, and we won't know what (if anything) we'd want from a stripped-down limited pattern-matching expression until we get some experience with pattern-matching statements.

    Predicate extractors


    Scala allows extractors that return a boolean instead of an optional tuple. This allows you to match without deconstructing anything:

    object PremiumCandidate {
      def unapply(user: FreeUser): Boolean = user.upgradeProbability > 0.75
    }
    
    user match {
      case PremiumCandidate() => initiateSpamProgram(user)
      case _ => sendRegularNewsletter(user)
    }
    

    This is obviously easy to add to Python:

    @unapply
    def premium_candidate(user: FreeUser):
        return user.upgrade_probability > 0.75
    
    case user:
        of premium_candidate:
            initiate_spam_program(user)
        else:
            send_regular_newsletter(user)
    

    In Scala, the different logic is driven off the static return type, but we can just as easily use the syntactic difference: If we get an unapply_function rather than a (syntactic) call of an unapply_function, we call it and if it returns something truthy it's a match; if it returns something falsey or raises AttributeError it doesn't.

    Scala also lets you use predicate extractors together with its equivalent of the as clause as a way of type-casting the value if it passes a type-check. For example, if initiateSpamProgram requires a FreeUser, rather than just a User, you could write:

    user match {
      case freeUser @ PremiumCandidate() => initiateSpamProgram(freeUser)
    }
    

    We can't do that in Python, but then we don't have any need to. At runtime, initiate_spam_program will probably just duck-type; if not, it'll isinstance or similar, and it will see that user is indeed a FreeUser. For mypy or other static type checkers, there's no reason the checker can't infer that if user passes premium_candidate(user: FreeUser), it's a FreeUser; not being able to do that is just a limitation of Scala's typer (user can't be defined as a User but then be inferred as a FreeUser within the same expression). We don't need to copy that limitation just so we can try to find a workaround for it.

    Extractors with apply


    You'll notice that I mentioned apply way back at the top, but I only implemented unapply in the above examples. So, let's show an example of apply:

    object Twice {
      def apply(x: Int): Int = x * 2
      def unapply(x: Int): Option[Int] = if (x%2 == 0) Some(x/2) else None
    }
    
    val i = Twice(21)
    

    Or, maybe a better example:
    object Email {
      def apply(user: String, domain: String): String = user + "@" + domain
      def unapply(email: String) =
        val parts = email split "@"
        if (parts.length == 2) Some((parts(0), parts(1)) else None
    }
    
    email match {
      case Email(user, domain) => if (domain == our_domain) message(user) else forward(email, domain)
      _ => log_unknown_email(email)
    }
    
    val email = Email("joe.schmoe", "example.com")
    

    As you can see, the apply method gives us a constructor whose syntax perfectly matches the case pattern used for decomposition. (Just as we get in my original proposal when your __init__ and __match__ match.) That's nice, but not a huge deal in many cases.

    Anyway, this one is easy to do in Python:

    @apply
    def twice(x):
        return x * 2
    
    case twice(21):
        of twice(n): print(n)
        else: print(-1)
    

    But I don't think this is necessary. It doesn't seem to be used much in Scala code, except in the special case of an extractor whose name matches the class--which we'd handle by having __match__, with the already-existing __init__ for a constructor. (Of course there's nothing forcing you to make your __match__ and __init__ parallel. But then there's nothing forcing apply and unapply to be parallel in Scala either. Usually, you'll want them to go together; in the rare cases where you want them to be different, there's nothing stopping you.)

    Plus, we already have ways to write external factory functions for types when you really want to; there's no reason to add and standardize on a new way.

    But, if we do want apply (and, again, I don't think we do), is sometimes having two functions instead of one a good argument for putting apply and unapply together in a class? I don't think so. Compare:

    @apply
    def twice(x):
        return x * 2
    @unapply
    def twice(x):
        if x%2 == 0: return (x/2,)
    
    @extractor
    class twice:
        def apply(x):
            return x * 2
        def unapply(x):
            if x%2 == 0: return (x/2,)
    

    Sure, the latter does group the two together in an indented block, but I think it looks like heavier boilerplate, not lighter. And in the very common case where you don't have apply that will be much more true. Also, notice that @extractor has to be a bit magical, auto-singleton-ing the class and/or converting the methods into @staticmethods, which makes it harder to understand. And, if anything, it's less clear what twice(21) or of twice(n): are going to do this way; sure, once you internalize the extractor protocol you'll be able to figure it out, but that's not trivial. (Plenty of Scala developers don't seem to get how it works in their language; they just treat it as magic.)

    Sequence extractors


    Scala lets you write sequence extractors, with another special method, unapplySeq. This returns an optional sequence instead of an optional tuple (or an optional tuple whose last element is a sequence), and there's special syntax that allows you to match the decomposed sequence with a * wildcard, like this:

    xs match {
      case List(a, b, _*) => a * b
    }
    

    It's pretty obvious how to do this in Python, and a lot more simply: this is just the normal use of * for iterable unpacking assignment, *args parameters, and splatting arguments. I already covered how this works with __match__ in my earlier post; it's the same for extractors.

    Infix extractors


    Scala also allows you to use infix extractors, to match infix patterns. For example, you could match 2 * n directly by using an extractor named * with an appropriate unapply.

    More typically, you match head :: tail to deconstruct cons lists, using an extractor that looks like this:

    object :: {
      def unapply[A](xs: List[A]): Option[(A, List[A]) {
        if (xs.isEmpty) None else Some((xs.head, xs.tail))
    }
    

    If Python doesn't need arbitrary infix operators, or the ability to use existing operators as prefix functions or vice-versa, or other infix-operator-related features like sectioning, it doesn't need infix patterns either.

    Especially since the primary use for this, deconstructing cons lists, is something you're rarely going to need in Python. (If you want cons lists, you have to write them yourself, and you won't get any syntactic sugar for constructing them, so why would you be surprised that you don't get syntactic sugar for deconstructing them? You can use the same Cons(head, tail) for a pattern as you use for a constructor.)

    Patterns in other places


    My proposal suggested explicit case statements, and generalizing sequence unpacking in assignments to pattern decomposition, but I didn't think about all the other places Python binds variables besides assignments. Scala's designers did, and they made it work in as many places as possible.

    So, looking at all the places Python does bindings:

    • Iteration variables in for statements and comprehensions: We can already sequence-unpack into the iteration variables; why not pattern-match? So, for Point3D(x, y, z) in vertices:.
    • as-clause variables in various statements. Python already allows sequence unpacking here, although it requires extra parentheses:
      • with: I don't think Scala has an equivalent, but I could definitely imagine wanting it here. Although often you're going to want to bind the whole thing as well as decomposing it, and doubling the as (with spam.open() as TextFile(filename) as f:) seems pretty horrible.
      • except: Scala's catch basically has an implicit ex match, which you write normal case clauses inside. This could actually be more useful in Python than in Scala, where exceptions are full of good stuff, and you could, e.g., pull the filename out of a FileNotFoundError. But it would be clumsy, because you've already matched the type: except FileNotFoundError as FileNotFoundError(filename):. So... maybe in a new language, but probably not useful in Python.
      • import: It seems like it would always be more readable to import and then case. Maybe someone can come up with a counterexample, but I doubt it.
    • For completeness, if we're looking at import ... as, then we should also look at from ... import, and maybe even plain import. And I think it looks just as bad in the former statement, and a thousand times even worse in the latter.
    • Function parameters in def and/or lambda. In my previous post, I suggested ML-style pattern overloads for def, and assumed that the fact that this wouldn't work for lambda wouldn't matter because why would you want it there? But in Scala, single-case lambdas can be written without all the match syntax, and this is used pretty extensively in idiomatic Scala. (Of course in Scala, pattern matching is an expression, not a statement, and idiomatic Scala uses lambdas in many places where idiomatic Python uses named functions.) There's a bit of ambiguity in the fact that a parameter list is actually a list of target-lists that have to be single targets or parenthesized--and remember that Python 3.0 removed that ambiguity by killing even basic iterable unpacking for parameters. Maybe someone could come up with a syntax that allows def line_to(Point3D(x, y, z)): or similar without that ambiguity, but even if you could, does that really look useful in a language without static typing and overloads? (Again, I think if we want this, we want a way to define overloads by pattern.)
    • Conditions are not currently a place to do bindings in Python, and that's a good thing. But occasionally, something like C's if ((c = getch()) != EOF) is handy. Could a stripped-down single-case pattern-matching binding in conditions be a way to get most of the benefits, without the problems? (Similar things certainly work in a variety of languages from Swift to Haskell, but in those languages, an if body has its own scope, which will either shadow an outer-scope name or raise an error complaining about shadowing, so I don't know if that means anything.) Anyway, I haven't thought much about this.

    Partial functions


    Scala makes extensive use of partial functions. Not partially-applied functions (like functools.partial), but partial functions in the mathematical sense: functions that are only defined on part of the implied domain.

    This is actually a weird hybrid of static and dynamic functionality, but it turns out to be useful. At compile time, Scala can tell that your cases don't exhaust all of the possibilities. In many functional languages, this would mean that you get a compiler error. In Scala, it means you get a partial function instead of a normal function. If you call it on an out-of-domain value, you get a MatchError at runtime. But you can also LBYL it by calling its isDefinedAt function. And code that always checks before calling can be statically determined to be safe.

    And, idiomatically, partial functions--usually constructed with the special anonymous-function-with-implicit-match syntax mentioned above--are what you often pass to things like collect. The collect function does the equivalent of map and filter at the same time, building a new sequence with the results of the partial function applied to the members of a source sequence, while skipping the ones that aren't defined. Like this, in Python terms:

    def collect(func, iterable):
        for value in iterable:
            try:
                yield func(value)
            except MatchError:
                pass
    

    Of course that demonstrates that a dynamic language that loves exceptions and EAFP really doesn't need an explicit concept of partial functions. A partial function is just a function that can raise (or that can raise a specific exception), and using partial functions is something we trivially do all the time without thinking about it. So, this is a neat way to get familiar Pythonic functionality into a very non-Python-like language, but Python has nothing to learn there. (Although maybe mypy does?)

    Other languages


    ML


    First, traditional ML patterns have a feature that we haven't discussed: "either patterns", where two (or more) different patterns (with "|" or "or" between them) are treated as a single case. In Python terms:

    case meal:
        of Breakfast(spam, *_) or Lunch(spam, *_):
            print('Are {} servings of spam really enough?'.format(spam))
    

    In ML, the composed patterns have to bind the same set of variables, with the same types (as they are in the example above). There have been attempts to expand that to wrap variables bound only on one branch in optional types, and even to allow variables with different types on each branch to be bound as choice types, but that turns out to be painful to use (you end up with a mess of pattern-matching expressions inside pattern-matching expressions that would be a lot more readable if you just flattened them out).

    In object-oriented ML variants, it makes sense to allow the different branches to match different class types, as long as they share a common supertype, in which case the bound variable is of that supertype. (I don't think either base OCaml or F# actually does that, but it makes sense...)

    But in a duck-typed language like Python, of course, you could dispense with all requirements: variables matched to different types in different branches are fine, as long as all the types duck-type successfully for whatever operations you use with them, and even variables only matched in one branch and not the other are fine, as long as you can handle the possible UnboundLocalError.

    Some ML offshoots have experimented with the dual idea of "both patterns", where two (or more) different patterns (with "&" or "and" between them) are treated as a single case. Here, obviously, they have to either not bind the same variables, or bind them to the same values. (The Python rule would presumably be that they can bind the same variables, but if they bind them to different values, one of them wins arbitrarily. Or maybe it would be useful to say the last one always wins, or the first truthy one, or some other rule? Without experimenting with it, it's hard to guess whether any of them would be particularly helpful...)

    With only literally deconstructing patterns, both patterns aren't very useful (what's the point of matching Breakfast(spam, _, _) and Breakfast(_, 0, eggs) instead of just Breakfast(spam, 0, eggs)?), but with unapply functions, they make a lot more sense. For example, if you had any two of the coordinate systems for 3D points, you could match the others--e.g., cartesian(_, _, z) and spherical(r, theta, _) gives you the cylindrical coordinates.

    I'm not sure how useful either of these features would be in Python, but they're not too hard to define. If someone wants to write a serious proposal to add pattern matching to Python, they should at least provide a rationale for not including them.

    F#


    F# builds on OCaml with the specific intent of using .NET libraries, and in particular using the C#-centric, OO-heavy .NET framework for most of its stdlib. So, they obviously needed a way to decompose arbitrary classes. Since Scala, Swift, Rust, etc. hadn't been invented yet, they had to figure it out for themselves. They took it as a specific case of the more general idea of matching abstract data structures, and wrote a paper on their design, Extensible Pattern Matching Via a Lightweight Language Extension.

    In simplest form, F#'s "active patterns" are pretty similar to Scala's extractors, and even more similar to the variation I suggested above: instead of defining a singleton object with unapply and apply methods, you just define plain (unapply) function with a special marker (F# uses "banana brackets" around the name, like (|Complex|), rather than a function decorator, which becomes important later) to declare it as an active pattern; you can then use it in any (single, partial-multiple, or total-multiple) match expression the same way you'd use a record type.

    But you can go farther. For example, instead of just returning a tuple, you can return what looks like a record constructor with the function name as the record type name and the tuple as its arguments. And you can then define a "banana split" pattern, with multiple names inside the banana brackets, which can return what looks like a record constructor for any of those names, which can then be pattern-matched as if they were the multiple constructors of an actual sum type. (In fact, they are constructors of an actual sum type, it's just anonymous). The example in the docs deconstructs an abstract list with (|Cons|Nil|) by calling its non-empty, head, and tail methods. In Python terms, it might look something like this:

    @unapply_split('Cons', 'Nil')
    def cons_nil(lst: LinkedList): # LinkedList is an ABC
        if lst: return Cons(lst.head(), lst.tail())
        else: return Nil()
    

    And you can then use Cons and Nil in patterns to match list instances. The (inferred) type of that function in F# is something horrible that you don't want to ever read involving the parametric choice type, so the fact that it would probably not be represented in the type system at all in Python isn't terrible. Anyway, while that's a nice feature, I can't imagine how to make it fit in nicely with Python syntax, and defining two separate partial unapply functions is not that much heavier, and probably a lot easier to understand:

    @unapply
    def Cons(lst: LinkedList):
        if lst: return lst.head(), lst.tail()
    
    @unapply
    def Nil(lst: LinkedList):
        if not lst: return ()
    

    However, it's a lot easier to prove that a pattern matching both Cons and Nil is a total pattern (or detect whether it's a partial pattern, for use in building automatic partial functions with is_defined support) on LinkedList with the "banana split". That's important for F# (or Scala), but probably not so much for Python. (Of course this particular example is pretty stupid for Python in the first place--just use the iterable protocol. But it obviously generalize to less trivial examples that aren't as silly in Python.)

    F# also allows for matching with independent active patterns, with the usual rule that the patterns are just dynamically tried in order and there must be a default case, which allows you to use partial matching functions and still prove total matching. Anyway, that feature is exactly equivalent to the separate unapply functions just described, except that in F# you have to explicitly define each active pattern as partial (by banana-splitting it with _), and of course it has to be represented somehow in the type system (by wrapping the single type or choice of types in an option that's normally handled for you by the syntactic sugar of the multiple-match expression), while in Python, you'd take it for granted that patterns without default can't be proven total and that the values are duck typed.

    Finally, you can extend this to wrap almost any function as a pattern match. For example, you can write a parse_re active pattern that takes a regex from the pattern, and additional arguments from the match expression, and returns its groups if successful, This is really only tricky with static typing, because inferring the type of that parse_re function is a nightmare. Also, in a dynamic language like Python, it's pretty much unnecessary: you'd either add __match__ to the SRE_Match type, or write an unapply function for it, and then just use re.compile(regex).match(x, y) as your pattern, right?

    Beyond all the ways active patterns expand on ML-style patterns, the fact that they're ultimately just functions means you can use them with higher-order functions. Of course the Scala extractor objects or the proposed unapply functions for Python have the same benefit.

    Finally, F#, unlike most ML offshoots, has a notion of mutable records. With a bit of clumsy syntax, you can actually bind to a mutable attribute in the match and then mutate that attribute in the resulting code. You'd think that, if this were useful, there would be a way to do the same with abstract types, but (short of building a bunch of proxy objects) there isn't. So, presumably that implies that we don't need binding to "reference cells" or anything like that in Python.

    C#


    More recently, the designers of C# looked at F# and Scala pattern matching in practice, and designed pattern matching for C#.

    It's worth noting that they decided that, even though their pattern matching has to handle arbitrary OO classes, it's still worth adding record types or more general ADTs if they're going to add pattern matching. That seems like more evidence for adding automatic matching for the record-like types that already exist in Python (like namedtuples) and/or adding actual ADTs.

    But the most interesting part of C#'s design is that they start out with a single-clause pattern-matching expression, EXPR is PATTERN, and build everything on that. The expression is true if the pattern matches, and any bindings in the pattern are bound in the enclosing scope. (Actually, their rule for deciding the appropriate scope is pretty complicated, and weird in some cases, but the Python equivalent would be easy--the enclosing function or class body is always the appropriate scope.) That's general enough to let you do things like if (expr is List x) do_list_stuff(x); or if (x?.y?.z is Int i) do_int_stuff(i); without needing special syntax like Swift's if let. And then a multi-pattern switch statement or a a multi-pattern match expression--or, if you're the C# team, why not both?--is pretty simple syntactic sugar for that (but syntactic sugar that the optimizer can recognize and maybe take advantage of). Destructuring assignment seems to be the only single-pattern use that couldn't be directly built on is, so they added special syntactic sugar for that.

    So, how do they handle deconstructing arbitrary classes? The default behavior only works for destructuring record/ADT types (and for simply matching values to types without any destructuring), but that is operator is overloadable, like any other operator in C#. At first glance, that sounds similar to the __match__ protocol (for normal-method operator overloads) or to the extractor protocol (for static-method overloads), but it's actually more complicated: is is treated like a conversion operator, so you can specify how any class destructures to or from any other class. The example in the docs is a Polar point class that deconstructs a separate Cartesian point class, so you can write var p = Cartesian(3, 4) and then later match it with is Polar(var r, var theta). So, Scala's extractors are really just a special case of C#'s overloaded is. The question is, are there any other cases that are useful? The only examples I see all use a static class with a static is overload and no other methods at all, which means they're just unapply functions with extra boilerplate...

    Mathematica


    I may have mentioned this in my previous post, but... In Mathematica, everything is a primitive value or a tree. And this means pattern-matching in Mathematica is designed around trees, rather than flat structures. So, you can match something like a[b[_], _], which is like first matching a[X, _] and then matching X against b[_]. You can also match against just the second level, or against just a leaf at whatever level it appears, etc. That's all pretty cool, but I don't think it's needed in any language where everything isn't a tree.

    Swift, Rust, etc.


    I've already mentioned Swift multiple times. The other obvious modern language to look at is Rust. I don't think there are any radical new ideas from either, but if I'm wrong, I'll come back and edit this section.

    Dynamic languages


    Pattern matching comes out of statically-typed functional programming. Scala, C#, Swift, Rust, etc. are all primarily attempts to bring ideas from static FP to static OOP (in ways that will hopefully be familiar to traditional OO programmers), and secondarily attempts to find ways to simulate some of the benefits of duck typing on top of static typing, so it's not too surprising that they all came up with similar ways to import the idea of pattern matching.

    But, besides asking whether the lack of static typing could be a problem for pattern matching in Python (e.g., the fact that we can't detect partial patterns at compile time), we should also ask whether duck typing could offer more flexibility or other benefits for pattern matching. I already mentioned how either and both patterns could be more useful in Python than in OCaml. And the fact that Python doesn't have to infer horrible choice types that no human will ever be able to debug if the inference goes wrong might be an advantage.

    But if there are any good examples in other dynamic languages, it would definitely be worth looking at them.

    Of course many dynamic languages have "pattern matching" in Perl terms: regex matching on strings, or some generalization of string-matching ideas to other types. I know that generalization is supposed to be one of the key ideas in Perl 6, but, having not dug into Perl 6 beyond reading the apocalypses, I don't know if it meets up with any interesting common ground with ML-style pattern matching, or if it's just SNOBOL string patterns with a more powerful grammar.

    Also, of course, people have implemented pattern matching in Perl, Ruby, etc., just as they have in Python, and those languages with more flexible syntax allow them to make it look more like ML--but that just makes it look less like the native language, doesn't help at all in designing a way to integrate it as a real native language feature, and, most importantly, generally does nothing to solve the problem of decomposing objects, which is the key problem.

    There's gotta be something on LtU...


    I've only done a cursory search at Lambda the Ultimate to see if anyone's published any papers on integrating pattern matching into dynamic OO languages. I found and skimmed some posts about the problem of decomposing objects, but all in terms of static OO (that's where I found the F# paper, which led me to the C# work in progress...). Some of those might have interesting ideas that aren't incorporated in any of the languages I've looked at, but I don't yet know. And it seems like there must be something on dynamic object decomposition out there, even if I didn't find it in a couple minutes.

    How useful is pattern matching in idiomatic Scala?


    Scala, like many modern functional languages (and, increasingly, even some primarily-imperative languages), builds its failure handling around an Option type, basically equivalent to Haskell's Maybe but with some extra stuff piled on top (like being able to construct an Option from a nullable-typed Java object).

    In most such functional languages, pattern-matching is the idiomatic way to deal with optional values. But in Scala, many guides (like part 5 of Westheide's) point out that it's pretty verbose in Scala, and it's often better to use the extras piled on top of the type instead. Compare:

    println("Name: " + user.name match {
      case Some(name) => name
      case None => ""
    })
    
    println("Name: " + user.name.getOrElse("<unknown>"))
    

    Honestly, I'm not qualified to say whether Westheide is right here (although other guides agree with him--and the fact that Scala has the getOrElse method is telling), or whether this extends to other paradigm examples of pattern matching or is specific to the fact that Option, and what you usually do with its values, is so simple. But still, it's worrying: if idiomatic Scala usually doesn't use pattern matching even for one of the paradigm examples of pattern matching, how often would idiomatic Python use it?

    Comparing to Swift


    Swift made extensive use of Optional from the first beta, but the designers originally didn't feel the need to add general pattern matching. Instead, the general solution is to test for nil, and then unwrap inside the tested code. There's special support for that all over the place:

    • There's a single-character type operator for optional types: int? is the same as Optional<int>.
    • Unwrapping is a single-character postfix operator: var!.
    • The language's truthiness notion is based on non-nil (except for actual boolean values), so you can write if var { println(var!) }. (Yes, in theory, this means optional booleans are a bit harder to deal with, but that rarely if ever comes up--practicality beats purity and all that.)
    • if let lets you combine the test and binding into one: if let v: int = var { println(v) } binds the value from the optional int var to the non-optional int v, so you can use it in the "then" scope.
    • guard is like if, but lets you bind a variable in the enclosing scope (by enforcing that the failing case must exit that scope), so you can write guard let v: int = var else { return } and then use v in the rest of the function.
    • The nil-coalescing operator ?? serves most uses of an "unwrap-or-else-use-default" method: 2 + (v ?? 0).
    • The typing engine, including inference as well as checking, understands all of this, so you can just write if let v = var without specifying a type, the compiler can skip any warning or runtime type check inside the body of an if or after a guard, etc.

    However, in later versions of Swift, they felt the need to add additional sugar to pattern matching for optional values anyway. In general, you match an Enum (ADT) constructor (in switch, in a single-pattern if, and a few other places) with case .Kind(val1, val2) = var, and any bindings you want have to be explicitly signaled, with case .Kind(val1, let var2). But when you're matching an optional value, instead of if case .Some(let var1) = var, you can just write case let var1? = var2. So, maybe this implies that pattern matching is important, even when you've done everything you can to special-case away the need for it. Usually, idiomatic Swift code gets away with avoiding any pattern matching for optionals by using if let and ??, but the fact that the designers felt the need to add case let var? implies that they couldn't get rid of every need for it.

    Naming


    From a quick survey of other languages:

    ML uses case ... of to introduce the multi-case pattern-matching construct, and | to separate cases; a function definition can also be defined in terms of cases on its parameters, without needing the case bit. Some ML descendants change the | to a different symbol, or change it to the keyword of (and remove that from the outer clause). But newer descendants mostly strip it down even further. Erlang makes every binding a single-case pattern match, so you don't need any syntax. Haskell embeds the multi-case match into various bits of syntax, like binding, and replaces the | with whitespace, so there's basically no syntax anywhere.

    Meanwhile, to people coming from the imperative world, C's switch and case are familiar. Whether that's a good thing, or an attractive nuisance, seems to be up for debate. Swift deliberately uses both of the same words. Many other languages (Scala, Rust, draft ES7) use match instead of switch, but some of them still use case for the individual cases. Others avoid both C keywords.

    One advantage out of using the word case this way is that, unlike of or a symbol or whitespace, it reads well in single-case stripped-down constructs, like Scala's single-case simple anon functions ({ case(a, b) => a+b }) or Swift's if case and similar statements (if case .Point(let x, let y) = point where x == y { print("diagonal at \{x}") }).

    Anyway, my use of case and of probably isn't actually familiar to anyone who isn't currently in a late-80s university class, and it collides with C just enough to be misleading but not enough to be comforting. So, I think either match/case or switch/case would fit Python better. (And, if C# plans to set the precedent that switch means statement and match means expression--coinciding with Swift's switch statement and Scala's match expression--maybe it's not worth fighting that.)

    Use cases


    The main thing stopping anyone (or at least me) from writing a serious proposal for adding pattern matching to Python is the lack of good use cases. Everyone who uses any language built around pattern matching knows that pattern matching is useful... but coming up with actual examples that make sense in Python is harder than it should be. And maybe that means something.

    Part of the problem is that most of the examples that aren't just toys tend to be things that can be better done in other ways in Python:

    • Option, Error, etc. types are an alternative to exceptions, and Python uses exceptions pervasively instead. (Plus, as Scala implies, pattern matching may not even be the best way to handle these most of the time.)
    • Decomposing records in general tends to fight against OO encapsulation. While abstract matching solves that problem, most examples, even in languages like Scala and F# that have abstract matching, seem to be borrowed from languages that don't (and written around case classes, records, or whatever they call them). While such cases might arise in Python, the fact that we don't generally use namedtuples all over the place (and they're buried in collections rather than builtins) and we don't have more general ADT-like features implies otherwise.
    • Simple switching functions like factorial (whether with a match expression, or as pattern-matched overloads) only really look simpler with pattern matching when you define them recursively. But in Python, you generally wouldn't define them recursively.

    • Recursively decomposing (cons) lists (and generalizing that to, e.g., abstract list-like objects, as in the F# example above) has the same problem, and the further problem that you rarely use cons lists in Python, and the further problem that, even if you did, you'd almost certainly want to use the iterator protocol on them.

    • Recursively decomposing more complex recursive structures that don't fit the iterator protocol seems more promising, but again, you'd usually want to process them iteratively rather than recursively if they're large enough to be a problem. Plus, you don't often run into deeply recursive data structures anyway--or, if you do, they tend to be implicit in JSON or similar, and pattern matching is more verbose there, not less; the benefit of using patterns there is static type checking, which you aren't going to get in Python.

    • Many examples would be more idiomatically written as method dispatch in an OO language. Even the more complex related problems (like operator double dispatch) already have better idiomatic solutions in Python.

    • The equivalent of a C switch statement is not very exciting, even when extended to matching enum constants or strings instead of just integers, and that's already readable enough as an elif chain. Or, in many cases, you'd just use a dict.

    There must be examples in my code or other code in other languages that would still look like idiomatic Python-plus-patterns when translated... but I haven't found them yet.

    Conclusions (or lack thereof)


    A year and a half ago, I concluded that, even though I think I came up with a reasonable and implementable proposal for pattern matching in Python, I don't think it's a feature Python needs, or should get, at least at present.

    The fact that Scala's designers came up with a similar solution to what I proposed, and it's very heavily used in idiomatic Scala, even by people who are mostly coming to the language from Java rather than ML or Haskell or something, is pretty heartening.

    There are definitely ways that Scala's feature is more flexible than the one I proposed--and, for the most part, we can pretty easily add that flexibility to my proposal, making it even better.

    On the other hand, the fact that Scala seems to encourage alternatives to pattern matching in some of the places I'd use it in Haskell or ML makes me think it might turn out to be underused in Python as well. Or maybe not--after all, maybe you sometimes have to fall back to pattern matching (or get much more verbose) even though the alternatives are there, as is presumably the case in Swift?

    So... now I'm less against the idea of adding pattern matching to Python. But I'm still not sufficiently excited about it to build an implementation, write a PEP, try and convince the core developers, etc. And if switching between Haskell and Python didn't make me ache for pattern matching in Python, I doubt that switching between Scala and Python will either. But maybe I'm wrong; time will tell.
    0

    Add a comment

  7. About a year ago, Jules Jacobs wrote a series (part 1 and part 2, with part 3 still forthcoming) on the best collections library design.

    Much of what follows is a repeat of ideas from Swift-style map and filter views and other posts like Iterator terminology, but hopefully seeing it from a different viewpoint will make it clearer. (At least writing this response to his post made some of it clearer to me.)

    Of course Jacobs acknowledges that there are tradeoffs, and his final best design isn't perfect. It's basically an extension of Clojure's reducer/transducer design (by splitting transducers into refolders and folders, aka transformers and consumers, you can design everything around composing transformers without running into all of the same problems). He's candid about what's missing.

    But I think he's missed something huge.

    The starting idea is that you want a single, fully-general sequence type, with functions to convert to and from all of your different collection types. This way, you only need to write all of your transformers (map, filter, zip, etc.) once, and, with minimal compiler/runtime support, fromSet.map(f).toSet can be just as efficient as a custom setMap f would be (and likewise for list, array, sorted set, or custom third-party types).

    Of course this isn't a revelation. The fundamental idea behind the STL (which eventually became the nucleus of the C++ standard library) decades ago was that you can split up iteration and algorithms, so if you have N collection types and M algorithms you only need to write N+M functions instead of N*M. And this is built into Python at a fundamental level: map, filter, zip, etc. consume any kind of iterable (which means collections or iterators) and produce an iterator. And most collection types can construct themselves from any iterable. So, the Python equivalent is just set(map(f, x)).

    But newer collection libraries since 1979's original STL design have run into the same N*M issues, and tried to solve them in new ways, and run into problems doing so. For example, in Scala, map and friends try to return the same collection type as their argument, meaning that things like zip or concat have to choose one argument and work out how to convert the other, and have to be able to programmatically figure out how to build a list[T,U] when specialized on list[T], set[U]. Just giving you back a general sequence type, or iterator, or whatever that can be lazily and cheaply converted to list[T,U] (or to any other type you prefer, or just used as-is) by the user avoids all of those problems.

    Back to the future (1979)


    But STL had something that Python doesn't, and neither do any of Jacobs' steps toward a solution: instead of just one kind of iteration, there are four kinds. (I'm ignoring output iterators, and mutating algorithms in general, in what follows, to keep things simple.)

    Python iteration is one-pass, forward-only (which STL calls "input").

    But sometimes this isn't sufficient. For example, if you want to build an average function with one-pass iteration, you need to keep the running total and count in some kind of persistent state. But with (multi-pass) forward iteration, you can just compose the sum and len functions in the obvious way. Jacobs's design takes this into account.

    But sometimes, this isn't sufficient either. For example, if you want to build unique_justseen with forward iteration, you need to keep the previously-seen value in some kind of persistent state. But with bidirectional iteration, you can just look at the previous value directly. Or, more simply, imagine how you'd build a reverse function with only forward iteration. (In Python, of course, there's a special method for producing reversed iterables, as part of the Sequence protocol.) Jacobs's design doesn't take this into account; the only way to reverse his sequences is to stack up all of the values.

    And sometimes, even this isn't sufficient, because you want to randomly access values. For example, there are a variety of sorting, permuting, etc. algorithms that are obvious with random access and either more difficult or impossible without. Jacobs's design doesn't take this into account either.

    The way STL handles this is simple, if a bit clumsy in practice: instead of dealing with sequences, you deal with iterator ranges, and there are four different iterator types, with the obvious chain of subtype relationships. Some algorithms require random-access iterators; some only require forward iterators, and will therefore also work with bidirectional or random-access iterators; etc. In languages with the right kind of overloading, you can even have an algorithm that requires bidirectional, but has an optimized specialization for random-access (e.g., it can run in linear instead of log-linear time, or constant rather than linear space, with random-access iterators).

    Swift makes things simpler, even if the idea is more confusing at first. Swift collections use indexing for everything. Random-access iteration is indexing with integers, as you're used to. Bidirectional iteration is indexing with special index types, which you can only get by asking a collection for its start or end or by incrementing or decrementing another index. And so on. Forward iteration is the same, except there's no decrement.

    So, in Swift, you define map as a function that takes a sequence, and returns some sequence type (a MapView) that's indexable by whatever indexes worked on the original sequence. When you write m = map(f, x) and then ask for m[idx], you get f(m[idx]). And likewise for zip, and plenty of other functions.

    Of course filter is a bit of a problem; there's no way to get the 7th element of a filtered list without looking at the first 6 predicate-passing elements and all predicate-failing elements up to the next passing one. So, filter can only give you bidirectional indexing, even if you give it a random-indexable sequence. But that's not all that complicated.

    It should be possible design a language that let you only write the code for map, filter, etc. once, and have it automatically give you back the appropriate type (which would be inferrable at compile time from the sequence type of the argument).

    How does this fit in with Jacobs's pros and cons?


    The first thing to notice is that as long as you only use the results in one-shot style (even if the input supports more powerful indexing), the semantics are identical to Python's iterators. So you get all the same benefits and limitations.

    But if you, e.g., have a random-access sequence, and you pass it through map to get another random-access sequence, the result is still data-parallelizable in exactly the same way as the input, which is an advantage that Python iterators (and lazy lists, etc.) don't have.

    So, it's the best of all worlds, right?

    Well, not quite; there's a new con that Jacobs's taxonomy didn't anticipate. If you write m = map(f, x) and then ask for m[7] twice, you're going to call f[x[7]] twice. For some algorithms, this could be a big performance hit. (And, if you stop ignoring mutation and side effects, it could even be incorrect or dangerous.)

    Of course you can manually slap memoization onto a lazy sequence, but then you're risking the memory benefits. For example, if you want to one-shot iterate all adjacent pairs of values that should only take O(2) space, but a naively-memoized map view will use O(N). Sure, you can use an LRU cache with depth 1 for memoization, but then you need to convince yourself that you're never re-calculating anything. If you can do that, you could probably just as easily have written the code that uses explicit persistent state to store the last value and just stick with a one-shot iterator.

    I don't know whether this is a real problem, and, if so, how often, and how easy it generally is to work around. It may be a complication worth ignoring, or it may make the entire idea into an attractive nuisance.

    Are there really only four kinds of iteration?


    The original goal was to get N+M functions instead of N*M. But if there are K kinds of iteration, we've really got N+M*K, which is just as bad as we started with, right?

    Well, no. N, the number of possible collection types, is huge and open-ended; K, the number of possible iteration types, is small and fixed.

    Maybe you can imagine ways to extend random-access indexing to things like NumPy multi-dimensional slices or Mathematica tree level selections, but if you think through what map should do with those, it's pretty clear that there shouldn't be any new code to write. (There might be some new algorithms that require these stronger kinds of indexes, but that's fine.)

    Also, how many of these can you imagine? People have come up with half a dozen ways to index or iterate over the past half-century, as opposed to dozens of collection types with hundreds of different variations and implementations, with new ones appearing every year.

    The hard part


    So, I said above that it should be possible to build a language that lets you just write map once, and it'll work appropriately with any of the four iterator types (or any new subtypes added to the iterator tower). Of course if you want to write an algorithm with a special optimized implementation for stronger iterators, you'd need to write it twice, but that's pretty rare, and never mandatory.

    And ideally, the static type system should be able to infer that map[a->b, Set[a]] returns the same kind of iterator as Set, but with element type b instead of a. (If filter has to manually specify that it can't handle anything better than bidirectional, instead of the type system inferring it, that's acceptable.)

    So, is that doable?

    Swift doesn't succeed. You have to write a lot of boilerplate, and to repeat yourself a lot more than should be necessary. But is that a limitation of Swift, or of the idea? I'm working this through in my spare time, but I don't have as much of that as I'd like.

    Fold, refold, unfold, spindle, manipulate


    Going back to Jacobs's posts, he gets into a number of things I haven't covered here. Ideally, you want to be able to support push producers as well as pull producers, and take advantage of parallelism at the production side as well as the processing side, and provide some way to keep state persistently within a refold--and doing all of those at the same time is impossible.

    He takes a step toward a solution, by proposing to redefine map, filter, etc. in terms of what chains of these transformers do. I believe the idea is that you can capture all kinds of transformers by writing a single producer-adapter for each kind of input and a single consumer-adapter for each kind of output. Since he never finished part 3, I'm not positive this is what he was going to write, nor am I positive that it would work, but at least it seems like this is where he was going.

    If so, it shares a lot in common with the replacing one kind of sequence with four kinds with different indexing. It means that not all functions can work with all producers or all consumers--filter can't work with a random-access consumer, and, likewise, zip can't work with a push producer--but hopefully the system can take care of this automatically (or, at worst, require you to specify it with a simple tag), leaving you to just write the basic implementation of filter in the obvious way.

    The big question is, what happens when you put these two together?

    I think his proposal (that I'm imagining) effectively needs an adapter for each combination of each of the binary questions (push vs. pull, parallelizable, etc.), and, ignoring the nonsensical ones, this means something like 8 adapters on each side. And I think the adapters may need to be able to interact differently with different kinds of iteration, rather than being purely orthogonal, which means we've now got 32 adapters on each side. Put another way, N+M*K+K*J isn't too bad if K and J are small and constant, but still, writing 64 functions, almost all of which are semantically empty, still seems pretty ugly. Hopefully, if I'm right about all of this, most or all of the adapters could be trivially auto-generated (which means that a clever enough language wouldn't require you to implement them at all, of course).

    Anyway, that's a lot of ifs toward the end. But unless Jacobs posts part 3, or I get the time to look into this in more detail myself, that's as good as I can do.
    1

    View comments

  8. In three separate discussions on the Python mailing lists this month, people have objected to some design because it leaks something into the enclosing scope. But "leaks into the enclosing scope" isn't a real problem. It's something that implies the actual problem they're complaining about—but it implies many different things.

    Often, people are just worrying because what they're doing would be a problem in C or Lisp or whatever their first language is. Many of these problems aren't true in Python, or are in theory but are never or rarely relevant to most Python scripts or modules, in which case the solution is to just stop worrying.

    Sometimes, there is a real problem. But it's worth knowing what the real problem is, so you can solve that.

    Background

    For example, consider the usual way of adding optional parameters:
        def spam(eggs=None):
            if eggs is not None:
    
    The problem here is that None is a perfectly good value for many uses, and you may want to distinguish spam(None) from spam().

    One possibility is:
        def spam(*args):
            if args:
                eggs, = *args
    
    But this gives you a much less meaningful signature, and takes more code, and provides less useful error messages (e.g., if you pass two arguments, instead of a TypeError telling you that spam takes at most 1 positional argument, you get a ValueError telling you that there were too many values to unpack).

    So, the idiomatic solution is this:
        sentinel = object()
        def spam(eggs=sentinel):
            if eggs is not sentinel:
    
    That solves both problems—None may be a valid value for eggs, but sentinel surely isn't, because you just invented it here, and don't use it for any meaning except "not a valid value for eggs". And the signature makes it pretty clear that "eggs" is an optional parameter, and the errors are exactly what you'd expect, and so on.

    But it means the sentinel is now available in the same scope as the function.

    So what? Well, that's what this post is about.

    It's public

    Often, the scope in question is the global scope of some module. Putting sentinel in the global scope of a module means anyone who uses the module will see it. This means that from module import * imports it. And that the inspect module, and similar tools, will show it as a part of the module's interface—your IDE or REPL may offer it as a possible completion for module. and module.s, your automatic documentation generator may list it in the documentation of the module's public interface, etc.

    But the problem here isn't that it's in the module's global scope; it's that it's public in the module's global scope. This is exactly what the private underscore convention is for: use the name _sentinel instead of sentinel, and it's no longer public. People can still find it if they go looking for it, but most tools will not offer it to them—and, even if they do find it, they'll know they're not supposed to use it.

    Also, any module that you intend people to use with from module import * really should have an __all__, which of course shouldn't include _sentinel.

    But private isn't really private

    Some people object that Python private names aren't really private. A malicious user can still access module._sentinel and, e.g., put it in the middle of a list, causing some function in your module to stop checking the list halfway through so they can sneak some other data through your protection.

    Nothing is private in Python. There's no way to hide anything. If they couldn't access module._sentinel, they could still access module.spam.__defaults__[0]. Or, if you somehow hid it in the constants, module.spam.__code__.co_consts[0].

    This is inherent to any language that allows reflection. Even in languages that don't, like C++, people can still find the "hidden" values by reinterpret_cast<char *> and similar tricks. Unless your language is specifically designed to protect pieces of code from each other (as Java is), there's nothing you can do about this. If you don't worry about it in every other language, don't worry about it in Python.

    It may collide with something else

    It's exceedingly unlikely that you actually had something different in the module named _sentinel. But what if use this same idiom twice in the same module? For example:
        _sentinel = object()
        def spam(eggs=_sentinel):
            if eggs is _sentinel:
    
        # ...
    
        _sentinel = object()
        def cheese(ham=_sentinel):
    
    Now, if you call spam(), the default value is the first _sentinel (because that gets bound at function creation time), but the if statement is checking it against the second one (because that gets looked up at function call time). Oops!

    But really, is this ever a problem? Unless you're writing 10000-line modules, or writing your modules by blind copy-paste, it should never come up. (And if you are, don't do that!) If you need multiple functions with sentinels in the same module, you either use the same one for all of them, or use a different name for each one.

    The one time this can come up is in auto-generated code. You need to make your code generator create a guaranteed-new name each time, in case it gets run twice on the same module.

    It wastes space

    Micro-optimizing space in a Python module is a mug's game. The value is already in the module. So is the name (for reflection, help, etc.). It does add one more entry to a dict that links the two together. That's 24 bytes on most 64-bit platforms. Even an empty dict is 288 bytes, and a module dict usually has at least __name__, __package__, __doc__, __all__, __loader__, __spec__, on top of your public and private names. This isn't a problem worth solving. If it were, you'd want to write a custom module loader that returned a different object that uses __slots__ (or C code) instead of a __dict__.

    It's slow

    Comparing to None is fast, because None is stored as a fast constant embedded in the compiled function, but comparing to _sentinel is slow, because it's stored as a global name lookup, which generally means a dict lookup on the module.

    You <i>can</i> solve that:
        _sentinel = object()
        def spam(eggs=_sentinel, _sentinel=_sentinel):
            if eggs is _sentinel:
    
    Now, you're comparing the first argument to the second argument, and they both default to the same reference stored in the function's fast locals, instead of the second one being a global lookup.

    But is this really a problem that needs to be solved? Yes, global lookups are slow. But so are function calls. And whatever you're actually going to do in the function is almost surely a lot slower. And even if everything were blazingly fast, the branch prediction failure on half the calls to spam would probably be much more costly than an access that almost always hits the cache.

    Accessing a global inside a loop over 1000000 elements may be worth optimizing away, but here?

    This micro-optimization used to be fairly common in early large Python apps—the Zope framework uses it all over the place. People also used to do things like pushing unbound methods as default values and then passing the self explicitly to avoid the attribute lookup and descriptor call. But as people have actually learned to profile Python code over the past couple decades (and as Python has improved, and CPUs have gotten more complicated…), it's become a lot less common, because it almost never provides any measurable benefit. So it probably won't help you here.

    This also solves the collision problem, and allows you to solve the visibility problem by just doing del _sentinel after the function definition. But neither of those is really a problem either, as explained above.

    And the cost is that it's less clear that, while eggs is an optional parameter, _sentinel is a "never pass this" parameter. The underscore helps, but this idiom isn't as widely known by programmers, or as widely supported by tools, as the usual uses of the underscore.

    Conclusion

    When you're worried that a value leaks into the enclosing scope, all of the real problems can be solved by just underscore-prefixing the name and leaving it out of __all__, and the other problems rarely need to be solved.
    2

    View comments

  9. There's a lot of confusion about what the various kinds of things you can iterate over in Python. I'll attempt to collect definitions for all of the relevant terms, and provide examples, here, so I don't have to go over the same discussions in the same circles every time.

    Some previous posts that may be relevant here:


    iteration

    Iteration is what a for loop does (whether a for statement, a comprehension, or the C-API equivalent). It's also, of course, what functions like map, zip, list, and islice do under the covers.

    The thing you iterate (loop) over is an iterable.

    The way you iterate over it (usually implicitly, with a for loop) is to call the iter function, which gives you an iterator. You then call next on the iterator until it raises StopIteration. This is called the iteration protocol

    Iterators are always iterables—if you call iter with an iterator, you just get the same iterator back. The reverse is, of course, not true.

    I'm cheating a bit here, because the iteration protocol is actually defined in terms of the dunder methods __iter__ and __next__ (or the slots tp_iter and tp_next, for C extension types) rather than the functions iter and next, and includes a fallback for iterating over old-style sequence-like things that don't define iterability but do define subscripting with contiguous indices starting from 0. But that rarely comes up nowadays—and the iter function wraps up that fallback. So, you can unread this paragraph.

    abstract base class (ABC)

    Abstract base classes complement duck-typing by providing a way to define interfaces when other techniques likehasattr() would be clumsy or subtly wrong (for example with magic methods). ABCs introduce virtual subclasses, which are classes that don’t inherit from a class but are still recognized by isinstance() and issubclass(); see the abcmodule documentation. Python comes with many built-in ABCs for data structures (in the collections.abc module), numbers (in the numbers module), streams (in the io module), import finders and loaders (in the importlib.abcmodule). You can create your own ABCs with the abc module.
    Many of the types described below have an ABC, in the collections.abc module, that defines the type in terms that can be tested from within the language. For example, every iterable should be a subtype of collections.abc.Iterable.

    This isn't always perfect (for example, a type that responds to the iter function, but doesn't return an iterator, isn't really a valid iterable, even if it's a collections.abc.Iterable subtype as far as issubclass is concerned). That's usually not a problem unless someone is deliberately obfuscating their code—except in one case. Many third-party collection types do not subtype the relevant ABCs. For example, a third-party sequence may not be a subtype of collections.abc.Sequence, even though it supports all of the right methods with the right semantics. So, in a discussion, is someone shows "issubclass(Spam, Sequence) == True", that can be taken as proof that Spam is a sequence (unless you have reason to believe that it's intentionally or accidentally being misleading—as was the case for range in Python 3.1…), but "issubclass(Spam, Sequence) == False" doesn't prove that Spam isn't a sequence.

    iterable

    An object capable of returning its members one at a time. Examples of iterables include all sequence types (such asliststr, and tuple) and some non-sequence types like dictfile objects, and objects of any classes you define with an __iter__() or __getitem__() method. Iterables can be used in a for loop and in many other places where a sequence is needed (zip()map(), ...). When an iterable object is passed as an argument to the built-in functioniter(), it returns an iterator for the object. This iterator is good for one pass over the set of values. When using iterables, it is usually not necessary to call iter() or deal with iterator objects yourself. The for statement does that automatically for you, creating a temporary unnamed variable to hold the iterator for the duration of the loop. See alsoiteratorsequence, and generator.
    Fundamentally, an iterable is something that you can use in a for loop. Everything else described here is an iterable. Functions like map, filter, zip, enumerate, etc. can all take any iterable.

    iterator

    An object representing a stream of data. Repeated calls to the iterator’s __next__() method (or passing it to the built-in function next()) return successive items in the stream. When no more data are available a StopIteration exception is raised instead. At this point, the iterator object is exhausted and any further calls to its __next__() method just raiseStopIteration again. Iterators are required to have an __iter__() method that returns the iterator object itself so every iterator is also iterable and may be used in most places where other iterables are accepted. One notable exception is code which attempts multiple iteration passes. A container object (such as a list) produces a fresh new iterator each time you pass it to the iter() function or use it in a for loop. Attempting this with an iterator will just return the same exhausted iterator object used in the previous iteration pass, making it appear like an empty container.
    More information can be found in Iterator Types.
    Iterator is a subtype of iterable—that is, every iterator is an iterable. But other things are iterables too, like sequences. What distinguishes an iterator is that calling iter with an iterator returns the iterator itself. This means that iterators are inherently one-shot: once you iterate an iterator, you've got an empty iterator.

    There is no concrete type "iterator". Files are iterators. So are the results of generator functions. Many built-in iterable types (like list, range, and set) have their own custom iterator types. They all support the Iterator ABC, but they're not related in any other way.

    Iterators are generally lazy—that is, they only produce elements one at a time, on demand. This isn't inherent to the type, it's just that there's very little good reason to ever create a non-lazy iterator; you'd just be wasting time and memory for no benefit.

    generator iterator

    An object created by a generator function.
    Each yield temporarily suspends processing, remembering the location execution state (including local variables and pending try-statements). When the generator iterator resumes, it picks-up where it left-off (in contrast to functions which start fresh on every invocation).
    A generator function is a function with a "yield" or "yield from" expression in it. When you call it, what you get back is a generator iterator. You also get generator iterators from generator expressions.

    A generator iterator is, as the name implies, an iterator. Generator iterators also support other methods like send, which aren't relevant here.

    Generator iterators are obviously inherently lazy—although there's nothing stopping you from writing "yield from f.readlines()", which would in practice be non-lazy.

    The word "generator" on its own is ambiguous. According to the glossary and the reference docs for generators, it should refer to generator functions. However, the in-language name "generator", the ABC "collections.abc.Generator", and the function "inspect.isgenerator" all refer to generator iterators (and "inspect.isgeneratorfunction" refers to generator functions). If it isn't clear from the context which one you're talking about, don't use the word on its own.

    sequence

    An iterable which supports efficient element access using integer indices via the __getitem__() special method and defines a __len__() method that returns the length of the sequence. Some built-in sequence types are liststr,tuple, and bytes. Note that dict also supports __getitem__() and __len__(), but is considered a mapping rather than a sequence because the lookups use arbitrary immutable keys rather than integers.
    The collections.abc.Sequence abstract base class defines a much richer interface that goes beyond just__getitem__() and __len__(), adding count()index()__contains__(), and __reversed__(). Types that implement this expanded interface can be registered explicitly using register().
    A sequence—something like list, str, or range—is an iterable, but it's not an iterator. In particular, it's always a reusable iterable—you can loop over a list as many times as you want and get the whole list each time.

    Many sequences are non-lazy, like list, but some are lazy, like range.

    Sometimes, the word "sequence" is confusingly used to mean any reusable non-iterator iterable, or even any non-iterator iterable, rather than specifically a sequence.

    Sometimes you'll see the term "almost-sequence", to refer to something which supports indexing, but doesn't support the entire sequence protocol. Before Python 3.2, range as such a type. Even in 3.5, the term is still relevant—an almost-sequence is what you need to get the "fallback" iteration behavior. NumPy arrays are almost-sequences in this sense.

    non-iterator iterable

    There is no official term for something that's an iterable but not an iterator (which may be why the term "sequence" is sometimes confusingly used for it).

    But this is a useful concept to have. It's any type that you can call iter on it, that doesn't return itself. Sequences like list, mappings like dict, views like dict_values, and many other types are all non-iterator iterables.

    Most non-iterator iterables are reusable, but there's nothing in the protocol that guarantees this. Sometimes the two concepts are confusingly conflated.

    reusable iterable

    There is no official term for something that's an iterable, but not an iterator, which can iterate over the same items repeatedly (and, usually, in parallel).

    But this is a useful concept to have. Sequences like list, mappings like dict, mapping views like dict_values, and many other types are reusable iterables.

    As mentioned above, sometimes this concept is conflated with the previous one, and sometimes the word "sequence" is confusingly used for both of them.

    view

    The objects returned from dict.keys()dict.values(), and dict.items() are called dictionary views. They are lazy sequences that will see changes in the underlying dictionary. To force the dictionary view to become a full list uselist(dictview). See Dictionary view objects.
    In addition to the specific meaning from the glossary, the word "view" is also used for any non-iterator iterable that similarly provides a view of (possibly some of) the (possibly transformed) elements of some other object.

    For example, if you have some object that supports the buffer protocol, you can create a memoryview object referencing all or part of its buffer. This is, as the name implies, a view into that other object. If you slice a memoryview, you get another view into the same object.

    Similarly, when you slice a NumPy array, what you get back is another NumPy array, which shares the same storage, but accesses it differently.

    Note that not all views are sequences. (In fact, dictionary view objects are not, despite what the glossary says. This is an example of the word "sequence" being used loosely for "non-iterator iterable".)

    lazy

    A lazy iterable creates its elements on demand. For example, range(1, 1000, 3) doesn't create 333 element up-front and store them all in memory; instead, whenever you try to access one (by indexing, by iteration, or otherwise), it calculates it on the fly.

    Generally, iterators are lazy, as are views, but something can be lazy without being either—like range.

    An iterable that creates each element the first time it's needed, but caches them, is usually still considered lazy. (In fact, this is how lists work in languages built around laziness. But notice that most such languages use "cons lists", which give you a very natural way to release part of the list without releasing the whole thing—so, e.g., you can zip a list and its tail, iterate the result, and it only keeps two elements in memory at the same time. That isn't as easy to do in Python.)

    virtual

    The word "virtual" in this context isn't defined anywhere, but it's used in other definitions. For example, older versions of the documentation defined range as a "virtual sequence". (I'm not sure the term still appears anywhere in the documentation as of 3.5, but it's still sometimes used in discussions.)

    The intended distinction seems to be that a lazy iterable that's not a view onto an actual iterable, but instead computes its values directly.

    Looked at another way: a range object is actually providing a view into something—an infinite iterable consisting of all of the integers—but that something doesn't actually exist anywhere in memory—hence "virtual".

    container

    The term "container" isn't defined anywhere, although there is a corresponding ABC.

    The ABC defines non-iterator iterables that support the "in" operator.

    However, the term is often used more loosely, as a term for non-iterator iterables in general, or a vague term meaning "sequences, almost-sequences, mappings, sets, views, and other similar things". There aren't too many non-iterator iterables that aren't "similar things", or that don't support "in" testing, so usually there's no confusion here.

    collection

    The term "collection" isn't defined anywhere. It's often used as a vague synonym for iterable, or an even-vaguer synonym for container.
    8

    View comments

  10. Python has a whole hierarchy of collection-related abstract types, described in the collections.abc module in the standard library. But there are two key, prototypical kinds. Iterators are one-shot, used for a single forward traversal, and usually lazy, generating each value on the fly as requested. Sequences--like list, tuple, or range--are repeatably iterable (generating a new iterator each time), searchable with the "in" operator, and indexable with subscript expressions (including negative indices and slices).

    Of course there are other kinds of collections that are repeatably-iterable containers but not indexable like sequences (set and dict being obvious examples), and even collections that aren't containers (not to mention strings—the in operator is a substring search, not an element search, so a string contains everything it collects, but not vice-versa…). But when you want one of these, you generally know it.

    When what you need is a bunch of values in a row, the key question is: do you need them repeatedly, or in random-access order? If yes to either, you want a sequence; if not, you want an iterator.

    Most people who get to this point think, "But I don't need the whole sequence interface! Look at how many methods list has!" But if you actually look at what's needed, it's not hard.

    Example: geometric ranges

    To take a concrete example, let's say we need to loop over geometric ranges--like range, but each value is the previous value multiplied by a factor, rather than incremented by a step. So, for example, the range from 10 to 10000 by 3 gives us 10, 30, 90, 270, 810, 2430.

    Iterators

    If we only need to iterate over this once, we can do it pretty simply:

        def geometric_range(start, stop, factor):
            while start < stop:
                yield start
                start *= factor
    

    There are other ways to do this. For example, you can use itertools.accumulate to repeatedly iterate a function over previous values, and itertools.takewhile to truncate it at a given value. You can build useful building blocks out of those to allow you to write a whole family of geometric-range-like iterators that are all simple one-liners. But here, we only need this one, so we might as well keep it simple.

    Sequences

    But what if we need to iterate the same range multiple times? Or iterate it in reverse? Or pass it around as an object? Or check whether a value is in the range? Or randomly access r[7]?

    One simple solution is to just return a list of all of the values. In fact, you can just do list(geometric_range(start, stop, factor)) using the generator function above, and you're done. This is what Python 2's range function did.

    That works, but it's not such a great idea if the range is large.

    Virtual sequences

    What you often want is the best of both worlds: something that lazily generates the values as needed, but that's still a sequence, not an iterator. Since what we're doing is very similar to range, just with geometric rather than arithmetic sequences, this shouldn't be surprising, because that's what range is. The docs describe this as a "virtual sequence".

    Where do you compute the values? In the __getitem__ method, of course. And, if you look at the docs, that's almost all you have to implement to be a Sequence; the only other thing you need to write is __len__.

    But how do we compute the values lazily? We've defined this as an iterated function: each value is the previous value times some factor. We obviously don't want r[7138] to have to multiply 7138 times. Fortunately, in this case, there's an easy answer: for integers, exponentiation is the same thing as repeated multiplication. So, the way to avoid multiplying 7138 times is to exponentiate once:

        class GeometricRange(collections.abc.Sequence):
            def __init__(self, start, stop, factor):
                self.start, self.stop, self.factor = start, stop, factor  
            def __getitem__(self, index):
                return self.start * (self.factor ** index)
            def __len__(self):
                pass # TODO
    

    If you paste this into your Python interpreter, you'll see that it works. You can write r = GeometricRange(10, 10000, 3), and then r[1] is 30.

    Of course this has two obvious problems: We haven't implemented __len__, which means not only len but also various other methods and functions that depend on it, will fail. And we haven't done anything with stop, which means that r[17] gives you 1291401630 instead of raising an IndexError). And it should be pretty obvious that these are related.

    So, what's the length of a geometric range? Consider that __getitem__ for an arithmetic range does start + (step*index), while for a geometric range it's start * (factor**index). And __len__ for an arithmetic range is (stop-start) // step, so we need to invert things in the same way: log(stop/start, factor). Python doesn't have an "integer log" function equivalent to its // integer division. The easy way around this is to just use the floating-point math.log and then round it to integer. If you're going to want huge ranges this may not be a great idea, but we'll stick with it for now. (Note that range can take values too big to fit even into a 64-bit integer, or even a float, although calling len on one will raise an OverflowError, at least as of 3.5.)

    But just implementing __len__ isn't quite sufficient; we also need to use it in __getitem__. So:

        class GeometricRange(collections.abc.Sequence):
            def __init__(self, start, stop, factor):
                self.start, self.stop, self.factor = start, stop, factor  
            def __getitem__(self, index):
                if index < 0 or index >= len(self):
                    raise IndexError('{} object index out of range'.format(
                        type(self).__name__))
                return self.start * (self.factor ** index)
            def __len__(self):
                return round(math.log(self.stop//self.start, self.factor))
    

    And that's a complete sequence. You can call any of the Sequence methods on it, or pass it to functions like reverse, etc. The Sequence class provides mixin implementations for everything necessary. For example, it makes your class iterable by defining an __iter__ method that yields self[0], then self[1], and so on, until it gets an IndexError.[1]

    Everything below here is nice to have, and often worth doing. But next time you say to yourself "I don't want to build a whole sequence class", look at the above. That's a whole sequence class, and one that implements a virtual sequence. You never have to do any more than this (although often you'll want to).[2]

    [1] That actually isn't strictly necessary, because Python already does that for you automatically if you try to iterate something with a __getitem__ and no __iter__. This is a leftover feature from before Python 2.3, but it's still sometimes useful. The reason the ABC provides a mixin implementation anyway is so that your type can be recognized as an iterable by the Iterable ABC.

    [2] Of course not every iterated function can be converted into an analytic function, and not every function is fully invertible. So, let's say that within the boundaries of what's mathematically possible, this is all you ever need. If you want to turn some fractal recurrence into a sequence--or, say, the line numbers in a file--you'll need a different solution, such as a caching lazy sequence. (See the linecache module in the stdlib for an example.)

    Missing "basic type" features

    There are a few things that aren't part of being a Sequence but that you still probably want. Most types should have a repr (if it makes sense, one that's usable to create the same object in code). We probably don't need a separate str, because you aren't going to be displaying these things to end-users. (Note that range doesn't have one.) Having == work as expected is always nice. And, if you have == and your type is meant to be used as an immutable value, you should add hash (although we haven't quite enforced immutability yet; see later).

    The usual way to implement these methods is just by comparing all of the attributes:

            def __repr__(self):
                return '{}({}, {}, {})'.format(
                    type(self).__name__, self.start, self.stop, self.factor)
    
            def __eq__(self, other):
                return ((self.start, self.stop, self.factor) ==
                        (other.start, other.stop, other.factor))
    
            def __hash__(self):
                return hash((type(self), self.start, self.stop, self.factor))
    

    But that isn't really correct here. Shouldn't GeometricSequence(1, 10, 2) == GeometricSequence(1, 11, 2) be true? After all, they both start at one, keep doubling, and finish with 8.

    If you look at range, it uses logic like that. The intuitive idea is that two ranges are equal if they contain the same values in the same order, just like two tuples or two lists or any other kind of sequence. Put in mathematical terms so we can implement it without iterating the entire sequence: two ranges are equal if they're empty, or if they start with the same value and have no other values, or if they start with the same value and have the same step and the same length.

    Notice that this also complicates hashing. Two equal values have to have the same hash. That means you can't include the start value when hashing an empty range, and you can't include the step and length when hashing a length-1 range.

    So:

            def __eq__(self, other):
                if len(self) != len(other):
                    return False
                if len(self) == 0:
                    return True
                elif len(self) == 1:
                    return self.start == other.start
                else:
                    return (self.start, self.factor) == (other.start, other.factor)
    
            def __hash__(self):
                if len(self) == 0:
                    return hash((type(self), 0, None, None))
                elif len(self) == 1:
                    return hash((type(self), 1, self.start, None))
                else:
                    return hash((type(self), len(self), self.start, self.factor))
    

    While we're at it: often, when you need an unusual __eq__ method, you also need to customize pickling and (deep-)copying. Fortunately, that isn't the case here. But, just for completeness, here's what it looks like:

            def __getnewargs__(self):
                return self.start, self.stop, self.factor

    Optimizing container methods

    Building a sequence automatically makes it a container: the in operator works, as do the index and count methods. But they work by exhaustively searching the sequence. The same way those methods work on list and tuple. But range doesn't have to do that; you can write 10**5000 in range(1, 10**10000, 2) and it returns false instantly.

    There are other types where it's worth implementing these yourself. For example, hash-based types like set can test for membership in constant time, and binary-tree-based types like a SortedTuple can do so in logarithmic time.

    Anyway, it's almost always pretty simple. Usually you abstract out the actual search algorithm, then call it in the three ABC methods, like this:

        def _find(self, value)
            return round(math.log(value/self.start, self.factor))
    
        def index(self, value):
            idx = self._find(value)
            if 0 ≤ idx < len(self) and self[idx] == value: return idx
            raise ValueError('{} is not in range'.format(value))
    
        def __contains__(self, value):
            idx = self._find(value)
            return 0 ≤ idx < len(self) and self[idx] == value
        
        def count(self, value):
            return 1 if value in self else 0
    

    While we're on optimization: For some types, it's also worth implementing __reversed__, but that isn't needed here.

    Also, if you put this class through its paces, you'll see that on some platforms, while iterating an instance, it spends a lot of time calling math.log. Almost all of those calls are in len(self) calls used to bounds-test each index. You can calculate the length at initialization time and store it, then just access self._length everywhere.

    Extending the sequence interface

    While this is enough for a complete, efficient sequence, it doesn't have all of the features of range, tuple, or other built-in sequences, because some of their methods have more complicated interfaces than the ABC requires.

    Negative indices

    One of the first little things many people fall in love with Python for is that you can write lst[-1] instead of lst[len(lst)-1]. But that won't work with our class. If you want negative indices, you have to do it manually. Fortunately, it's pretty easy:

            def __getitem__(self, index):
                if index < 0:
                    index += len(self)
                if index < 0 or index ≥ len(self):
                    raise IndexError('{} object index out of range'.format(
                        type(self).__name__))
                return self.start * (self.factor ** index)

    Slices

    Python sequences also usually allow slicing. That's again something you have to implement yourself. Many classes can do something as simple as this:

            def __getitem__(self, index):
                if isinstance(index, slice):
                    return [self[i] for i in range(*index.indices(len(self)))]
                # normal non-slice implementation here
    

    But ideally, you want slicing to return the same type as the sliced object. When you write range(1000000000)[::2], you get back another range, not a list of 500 million values. So, it's back to basic math for us:

            def __getitem__(self, index):
                if isinstance(index, slice):
                    start, stop, stride = index.indices(self._length)
                    start = self[start]
                    stop = self[stop] if stop < self._length else self.stop
                    return type(self)(start, stop, self.factor ** stride)
                # rest as above
    

    Bounded index

    The index methods of most built-in sequences take optional start and stop parameters. That's easy:
           def index(self, value, start=None, stop=None):
                idx = self._find(value)
                if start is None: start = 0
                if stop is None: stop = len(self)
                if start ≤ idx < stop and self[idx] == value: return idx
                raise ValueError('{} is not in range'.format(value))
    

    Immutability

    A tuple or range is immutable. Our class is sort of immutable, in that you can't assign r[7] = 133. But you can assign r.start = 3. That's a problem--especially since, by defining __hash__ we've implied that we're immutable.

    It's very hard to actually enforce immutability in Python, but you don't really have to; you just have to make it clear that the object is supposed to be immutable, and make it hard to mutate by accident. Generally, that means storing the attributes in private attributes behind read-only accessors.

    While we're at it, we might as well store the attributes in __slots__ variables:

        class GeometricRange(collections.abc.Sequence):
            __slots__ = '_start _stop _factor'.split()
            def __init__(self, start, stop, factor):
                sekf._start, self._stop, self._factor = start, stop, factor
            @property
            def start(self):
                return self._start
            @property
            def stop(self):
                return self._stop
            @property
            def factor(self):
                return self._factor
            def __getitem__(self, idx): pass
            def __len__(self): pass
    

    If you really want to be immutable, you sometimes need a __new__ method rather than __init__, especially if you expect users to subclass your type. That isn't necessary here, but just to show that it's not nearly as scary as it sounds:

        class GeometricRange(collections.abc.Sequence):
            __slots__ = '_start _stop _factor'.split()
            def __new__(cls, start, stop, factor):
                obj = super(GeometricRange, cls).__new__(cls)
                obj._start, obj._stop, obj._factor = start, stop, factor
                return obj
    

    If you instead wanted to make the class mutable... Well, it can't be a MutableSequence, because it doesn't have __setitem__, __delitem__, and insert, and there's no reasonable meaning for any of those. (Although I guess you could implement pop at least.) But could it be mutable and a Sequence but not a MutableSequence? Sure; the various SortedList types you can find on PyPI fit into that category (because they are, in effect, a list for immutable operations but a set for mutable ones...). So, you can remove the __hash__ method and not change anything else, and that's legal. But it feels like a strange design--a mutable sequence seems like it ought to allow you to replace, add, or remove elements by index, but this type can only globally change all the elements to a whole different set of elements. At any rate, even if it weren't for that gut feeling, the fact that we're intentionally emulating range, and range is immutable, makes the choice for us.

    Optional constructor arguments

    Constructor arguments aren't actually part of Sequence, or any ABC. (And that's a good thing--otherwise, defaultdict would have to be a factory that returns a dict subclass instead of just being a dict subclass...) But often, you're trying to emulate some specific sequence type, and that means emulating its constructor. The range class actually has a pretty complicated constructor signature. In pseudocode:
         range([start, ]stop[, step])
    
    Or, as shown in the actual docstring for range:
         range(stop)
         range(start, stop[, step])
    
    Python makes it easy to specify optional trailing arguments, but how do you specify optional leading arguments? Or, how do you specify two overloaded signatures for the same name? Well, you can't. If you think about it, in the general case it's impossible ambiguous; it only makes sense here in very special cases. Generally, you're never supposed to do this, and you should treat range as an aberration from the early days of Python rather than something to be emulated. Except that here, we're explicitly writing a range-like class, so we kind of have to emulate it. And there are a few other times when the same kind of thing shows up. So, here are the two ways to do it:
            def __init__(self, start_or_stop, stop=None, step=None):
                if stop is None and factor is None:
                    start, stop = 0, start_or_stop
                else:
                    start = start_or_stop
                if step is None:
                    step = 1
    
            def __init__(self, *args):
                if len(args) == 1:
                    start, stop, step = 0, args[0], 1
                elif len(args) == 2:
                    start, stop, step = args[0], args[1], 1
                else:
                    start, stop, step = args
    
    The first one is slightly more readable, but it's still ugly. The second one works exactly the same as range.[3] Notice that you could wrap the *args trick up in an overload-dispatcher similar to functools.singledispatch but switching on argument count rather than type. But, since you're only going to do this in very rare special cases, I don't think you want to abstract it out and make it easy. So, just pick whichever one you hate less.

    [3] The first one allows you to pass None as any argument, with the same effect as leaving out the argument; range won't accept that. If you really care, you can always do _sentinel=object() in class scope, then use GeometricRange._sentinel instead of None everywhere, then del it at the end of the class definition. But why would you care?

    Domain checking

    It goes without saying that you need to add error handling, but in this case, it's not clear what that means everywhere. If you pass something to __getitem__ that isn't a slice or an integer (or a slice with non-integral slice values), you'll probably get an exception, but not the nice TypeError you get from range. You'll want to add a check for that. This is common to most sequence types. The three find-related methods will all raise a TypeError or do something else wrong if you pass them non-integral values. You need to check explicitly, then raise a nice ValueError, return False, and return 0, respectively. This only usually comes up with virtual sequence types, or with containers that have some other reason to care about their element types (e.g., a bucket counter for a fixed set of values).

    If you pass a negative or zero factor, everything will appear to work until you try to (directly or indirectly) call __len__, at which point you'll get a ValueError: math domain error for trying to take the log of a nonpositive number. Obviously that's bad, but what's the right thing to do? One answer is just to raise a ValueError('GeometricRange() arg 3 must be positive') (akin to the equivalent range error). But GeometricRange(10, 1000, -3) makes intuitive sense as 10, -30, 90, -270. But then what about GeometricRange(10, 100, -3)? A log-based check will exclude the -270, but a simple less-than check will include it.

     If you pass non-integral numbers to the constructor, or to any of the various methods, many things go right, but some don't. If you want floats to work here, it's not hard to think things through and make everything work (although then you're losing a lot of the analogy with range). If you don't, it's pretty simple to add the tests and raise a TypeError.

    But notice that range is a bit special here. Most functions that want integers mean they want something where isinstance(value, int) is true. A few mean that they want something that returns a native C long from its index slot or (if implemented in C) or an int value within native C long range from its __index__ method (if not). Normally, in Python code, you don't have to worry about the latter. But, because range elements are often used as indices into lists, the range type does worry about it. If you want to emulate that, you have to do it manually (basically, just try to call value.__index__() if the isinstance fails).

    Also, along with this, range.__len__ raises an OverflowError if the range has more elements than will fit into a C long, but you probably don't want to emulate that.

    Anything else?

    Of course you'll want to write a docstring for the class, and for each public method. And, as implied above, there's probably some trivial error handling still to be added. But otherwise, this is pretty much as complete as you can get; I think I've covered everything you would ever need to touch even in the most esoteric virtual sequence. And it's still not that bad.
    2

    View comments

Blog Archive
About Me
About Me
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.