The documentation does a great job explaining how things normally get looked up, and how you can hook them.

But to understand how the hooking works, you need to go under the covers to see how that normal lookup actually happens.

When I say "Python" below, I'm mostly talking about CPython 3.5. Other implementations vary to differing degrees in how they implement things; they just have to result in the same rules being followed.

Variables: LEGB


Python has simple lexical scoping, like most other languages. (Unlike many languages, only module, function, and class definitions create new scopes, not every suite, but that's not too important here. Also, remember that lambda expressions and comprehensions are function definitions, and thus create new scopes, despite not having suites.)

The specific rule Python uses for lookup is LEGB: Local, Enclosing, Global, Builtin. In other words, when you write x, Python looks for a local variable named x then (if you were in a nested function or class) it goes through all the enclosing scopes' local variables, then it looks at the module's globals, and then it looks at builtins.

Meanwhile, because Python has implicit declarations (any assignment may be declaring a new variable), it needs a rule for that: any assignment creates a local variable (shadowing any enclosing, global, or builtin of the same name), unless you have a nonlocal or global declaration in the same scope.

The way Python implements these rules is not as simple as you'd think. Primarily, this is for reasons of optimization.

tl;dr


The short version is:

* Locals (except when shared with a closure) are loaded and stored in an array on the stack frame.
* Nonlocals (and locals shared with a closure) are basically the same but with an extra dereference.
* Globals are loaded and stored in a dict.
* Builtins are loaded via two or three dict lookups (and are never directly stored).

Fast locals


Local variables are stored in an array on the stack frame, and loads and stores are compiled into array-indexing operations on that array. This is basically the same behavior as C--and it's a lot faster than going through a dictionary.

Exceptions to this rule:

* Modules don't have fast locals; they use the globals dict as their locals, so everything works the same as for the globals section below.
* Local variables that are shared with a nested closure are slightly more complicated; see the next section.

So, how does Python know how to compile your locals to array indices? And how does it turn that back into the local variable names that you can see in tracebacks, locals(), inspect and dis functions, etc.?

When compiling the body of your function, Python keeps track of all of your variable assignments. (Assignments to attributions or subscriptions don't count; those are just calls on the attributed or subscripted object.) Function parameters are local, any variable that you assign to is local (unless you have a global or nonlocal statement for it); anything else is not.

So, it creates a tuple of all those local names, and stores it in the co_varnames of the code object. It also stores the total count in co_nlocals. Then, when compiling your code to bytecode, every load or save to co_varnames[3] becomes a LOAD_FAST 3 or STORE_FAST 3.

When a function is called, the stack frame reserves co_nlocals extra space at the end, called f_localsplus, and when the interpreter sees a LOAD_FAST 3, it just reads from frame.f_localplus[3].

For example, in this code:

def spam(eggs):
    cheese = 0
    return eggs + cheese

... eggs is local because it's a parameter, and cheese is local because you assign to it, so that last line will get compiled to, in effect, LOAD_FAST(0) + LOAD_FAST(1), not LOAD_NAME('eggs') + LOAD_NAME('cheese').

Now you can probably understand why these optimizations work:

def spam(eggs=eggs):
    local_cheese=cheese
    for _ in range(1000000000):
        eggs(local_cheese()))

You've replaced a billion dictionary lookups for the globals eggs and cheese with one dictionary lookup for each plus a billion array lookups for f_localsplus[0] and f_localsplus[1], which is obviously faster. (Dictionary lookups are, of course, constant-time, just like array lookups. But with a larger constant multiplier--large enough to make a significant difference here.)

But how does Python get the names back out? Well, as mentioned, they're stored in co_varnames. And when a function is called, the stack frame gets a pointer to the function's code object in f_code So, if it needs to build a traceback or disassembly or whatever for fast #1, it just gives you frame.f_code.co_varnames[1].

What about locals()? Well, there actually is a locals dict on the stack frame, called f_locals. But it doesn't get created unless you ask for it (e.g., by calling locals(), or just by asking for the Python version of the frame object with, e.g., sys._getframe()). This calls a function PyFrame_FastToLocals, which effectively does frame.f_locals = dict(zip(frame.f_code.co_varnames, frame.f_localsplus)), and then it returns that f_locals to you. It should be obvious why, as the docs say, modifying locals() doesn't affect the function's actual locals: it's just a snapshot of your locals. (There is actually a function PyFrame_LocalsToFast, but you can only call that from C, not Python.)

What about exec()? Again, as the docs say, this function by default works on (something equivalent to) it's caller's globals() and locals(), so, again, it can't change your local variables.

I used Python-like pseudocode for all the stuff that happens in C under the covers. But most of that actually is exposed to Python. For example, if you call sys._getframe(), the frame object you get back has an f_code member. Or, if you just look at a function, it has a __code__ member. And either way, that object has a co_varnames. And so on. The f_localsplus member, and the PyFrame_FastToLocals and PyFrame_LocalsToFast are the only things mentioned here that aren't exposed. And even there, you can always do this:

ctypes.pythonapi.PyFrame_LocalsToFast.argtypes = [ctypes.py_object, ctypes.c_int]
ctypes.pythonapi.PyFrame_LocalsToFast.restype = None
ctypes.pythonapi.PyFrame_LocalsToFast(sys._getframe(), 0)

Have fun with that.

Free and cell variables


Variables from an enclosing scope are almost the same as locals--they're handled by looking them up in an array--but with an extra deference. But that requires a bit of extra work to set up.

So, how does Python know how to compile your nonlocals to array indices?

As we saw above, when Python sees a variable, it knows whether that variable is local to your function. If it isn't, Python steps through the outer enclosing functions to see if it's local to any of them. If so, it's a free variable. (Of course a global statement means it's a global variable, so it doesn't look through enclosing scopes, and a nonlocal means it's a free variable even if there's a local assignment.)

Notice that the module's global scope is not counted as a scope for these purposes--anything found in the global scope is always global, not enclosing.

So anyway, the compiler creates a tuple of all those free names, and stores it in co_freevars. Then, when compiling your code to bytecode, every load or save to co_freevars[2] becomes a LOAD_DEREF 2 or STORE_DEREF 2.

Now, when a function is called, the stack frame doesn't just reserve co_nlocals space, but co_nlocals + len(co_freevars) extra space at the end (I'm still lying to you here; see a few paragraphs down for the truth) in f_localsplus, and when the interpreter sees a LOAD_DEREF 2, it just reads from frame.f_localsplus[frame.f_code.co_nlocals + 2].

Well, not quite. That obviously wouldn't allow you to reassign closure variables. (You could call mutating methods, but assignment would just break the connection, just like after a = b = [1, 2, 3], a.append(4) affects b, but a = [4] breaks the connection.) And that would defeat the entire purpose of nonlocal.

So, what you actually have in that slot isn't the object itself, but a cell object that has a member named cell_contents that's the actual object. And what the outer scope has in its fast locals slot is the same cell object, not the thing it points to. So, if either the inner function or the outer function assigns to the variable, they're not assigning to one of their fast local slots, but assigning to the cell_contents of a cell in one of their fast local slots, so they can each see what the other one does.

That's why local variables that are referenced by a nested scope work like nonlocal variables. They're stored in co_cellvars instead of co_freevars. Your frame's actual space is co_nlocals + len(co_cellvars) + len(co_freevars). The name that goes with a LOAD_DEREF 2 is equivalent to (co_cellvars+co_freevars)[2]. At runtime, that LOAD_DEREF 2 actually does a f_localsplus[co_nlocals+2].cell_contents. When you construct a nested function at runtime, there's zero or more LOAD_CLOSURE bytecodes to push the current frame's cells onto the stack, then a MAKE_CLOSURE instead of MAKE_FUNCTION, which does the extra step of popping those cells off and stashing them in the function's __closure__ attribute. And when you call a function, its __closure__ cells get copied into the f_localsplus array. Meanwhile, even if your function doesn't access any variables from an enclosing scope, if a function nested within your function does so, they're free variables for you, and have to get put in your __closure__ so they can end up in your stack frame so they can get to your nested function's __closure__ and stack frame.

And I believe that's all of the actual details correct.

But you don't have to memorize all this stuff, or go dig it out of the C code, or work it out by trial and error, or plow your way through that horrible paragraph ever again; the docs for the dis module describe what each opcode does, and have improved tremendously over the past few versions of Python. If you can't remember exactly what LOAD_DEREF looks up and where, just look at the docs: "Loads the cell contained in slot i of the cell and free variable storage. Pushes a reference to the object the cell contains on the stack." That might not have explained everything to you if you'd never read anything before, but it should be more than enough to refresh your memory once you have.

Also, again, most of this is exposed to Python, so the Python-esque pseudocode above is almost all real code. However, from Python, you can't write to a cell through its cell_contents, and you can't construct new cell instances. (If you want to have fun with ctypes.pythonapi, you can probably figure it out from here.)

Finally, locals(), in addition to giving you a snapshot of your locals rather than a live way to change them, also flattens out any cells, giving you their cell_contents instead.

At first glance, it seems like it would be simpler to just have each function's stack frame have a pointer to its outer function's stack frame. You could still optimize things just as much by having instructions like LOAD_OUTER 3, 2 to go 3 steps out and get the fast-local 2 from that frame, right? In fact, some languages do implement closures that way. But that means that the a function's stack frame can't go away--and, therefore, neither can anything else it refers to--until all closures referring to any of its variables, or any of its outer scopes, goes away. And that "anything else it refers to" includes the stack frame of its caller. You can see why this could be a problem. There are ways to optimize that (although you still run into problems with coroutines), but it's actually simpler, not more complicated, to just reference the individual values pulled off the frame, as Python does.

Globals and builtins


Globals are easy. If the compiler sees a name that isn't assigned in the current function or an outer nested function, or that has a global statement, it's global. There's no special operation for builtins; it's just a fallback for every global lookup.

The compiler does still use an array here, but only to fetch the name from a constant tuple on the code object (co_names) to look up with LOAD_GLOBAL and STORE_GLOBAL at runtime. They look something like this:

def LOAD_GLOBAL(index):
    name = frame.f_code.co_names[index]
    try:
        return frame.f_globals[name]
    except KeyError as e:
        try:
            builtins = frame.f_globals['__builtins__']
            if isinstance(builtins, ModuleType): builtins = builtins.__dict__
            return builtins[name]
        except KeyError as f:
            raise e

def STORE_GLOBAL(index, value):
    name = frame.f_code.co_names[index]
    frame.f_globals[name] = value

That f_globals will normally be the actual globals for the module the current (module-level, function, or class) code was defined in, but it could be something else. (For example, exec lets you pass in any mapping for globals; if you want a sanitized and quasi-sandboxed environment with your own under-the-hood functions and builtins like exec and __import__ stripped out, that's easy.)

Much simpler than closure variables, right? But it can be slower. So, why have a special case for globals instead of just making it the last stop on the closure chain (or making builtins the last stop instead of a special case within the globals special case)?

Well, for one thing, not having fast locals or cells and doing everything through the globals dict allows for more dynamic stuff, like creating new variables in exec and using them in the same scope. For another, not having to make a list of all of the names at scope, simply adding them to the dict as they're created, means that the global scope can be executed incrementally, statement by statement (so it works the same as in the REPL--in fact, both the REPL and the top-level script work by just incrementally building a module named __main__). In particular, you can define a function that uses a global, and start passing it around, before that global has been created.

More importantly, lookup speed shouldn't really matter at global level. If you're doing any heavy computation that involves millions of name lookups at module scope, move it into a function and call that function. And if you're using those globals millions of times from another function, get them out of global scope too. Those are "structured programming 101" anyway; why try to optimize things that people shouldn't be doing?

Well, except for the fact that your module's top-level functions and classes are globals, and it's perfectly reasonable to use them millions of times. In less dynamic languages, they're only looked up as globals at compile time, so they aren't like runtime global variables--but in Python, functions, types, and everything else are first-class values, and they're looked up the same way as any other global variables. In practice, this usually isn't a bottleneck for most programs; when Python is too dynamic and slow for the inner code of some project, there are usually other factors that matter much more than this dict lookup. But occasionally, it is. So, this is one of those places people keep looking for new optimizations. At the user level, you can always just copy the global to a local or a default parameter value, as mentioned earlier. Or you can use an ambitious optimizer that tries to automate the same benefit (like FAT Python, which creates an extra version of each function's __code__ with globals copied into the co_consts, and wrapping the function in a guard that checks that the cached globals haven't changed since the function was compiled and, if so, calling the original "slow" function instead of the fast version).

What about classes?


A class definition is a scope, just like a function definition. There are a couple of minor differences (the special LOAD_CLASSDEREF opcode, the magic __class__ cell that's created in class scope so methods can access it directly or via super, etc.). But the really big difference is that the class code isn't executed every time the class is called; it's executed once to produce the dict that gets passed to type (or another metaclass), and then it, and its scope, goes away. (Of course the body of its __new__ and/or __init__ will get executed when the class is called.)

I don't think there's anything important to learn about classes nested in functions or classes (or classes nested in functions, unless you want to know how super works) that isn't obvious once you play with it.

Attributes


Attribute lookup seems really simple at first: Every attribute lookup is compiled to LOAD_ATTR, which uses a string taken from the same co_names array as LOAD_GLOBAL, and looks it up in the object's __dict__. Or, if it's not found there, in the object's type's dict. Or, if not there, then each type on that type's MRO. If that all fails, __getattr__ is looked up (but using special-method lookup, not normal attribute lookup) and called. Oh, unless __getattribute__ was overridden, in which case whatever it does happens instead. Also, somehow functions get magically turned into bound methods. And @property, and __slots__ and... and how does it find the __dict__ in the first place? And what about C-API types? Maybe it's not so simple after all...

It's actually pretty simple: the only rule in the interpreter is special-method lookup, and LOAD_ATTR only ever uses that rule to look up and then call __getattribute__. (Other special methods are handled internally by their opcodes or other special builtin or extension functions written in C.) Everything else happens inside __getattribute__; we'll get to that later.

Special method lookup


Special method lookup is only somewhat vaguely documented, in the data model reference chapter. See call_method and its helpers in the source for details.

The basic idea of special method lookup is that it ignores the instance dictionary, and most kinds of dynamic customization.

You may wonder what the point of special method lookup is. Obviously __getattribute__ lookup can't be hooked by __getattribute__, but why should it also skip the instance, or treat metaclasses differently? And why should other special methods like __add__ follow the same rules? (If you wanted to add per-instance methods to a value, __add__ or __lt__ would probably be the first things you'd think of.)

In fact, the original "new-style class" design for Python 2.3 didn't have special method lookup--but (long before the first beta), Guido discovered a problem: Some methods make sense both on normal objects, and on types. For example, hash(int) makes just as much sense as hash(3). By normal lookup rules, both call int.__hash__. By special method lookup rules the former skips int's dict and instead calls type.__hash__. And a few other methods have the same problem, like str and repr. And what would happen if a type had both __call__ and __new__? And, even those that don't make sense for types--int + str, abs(Counter), or -socket--might make sense for some subclass of type (e.g., the new optional static types use subscripting, like Iterable[int], for generics).

So, that's why we have special method lookup, and it obviously solves the __getattribute__ problem.

How do we find the __dict__ without finding the __dict__ first?


There's still a circular definition problem here. How do we find an attribute? By calling the __getattribute__ attribute on the object's type and walking its MRO looking in the __dict__ of each type. So... how do we get that __dict__? Certainly not by searching __dict__s. And the process also involves a variety of other attribute lookups--you get an object's type via its __class__, a type's MRO via its __mro__, and so on.

The secret is that none of these are actual attribute lookups. In the C struct representing each object, there's an ob_type pointing to the type. In the C struct representing each type object, there's a tp_mro pointing to the MRO, and a tp_getattro pointing to the __getattribute__, and a tp_dict pointing to the __dict__, and so on.

But you can set __class__ or __dict__ or __getattribute__ from Python. How does that work? In general, when you overwrite one of these special values, the interpreter updates the corresponding C struct slot. (There are a few cases where it does different things, like using a slot as a cache and searching the dict if it's NULL, but that works too.)

So, in the pseudocode below, when I write type(obj).__mro__, that's really following the ob_type slot from the object, then the tp_mro slot from the type, and so on.

Finding __getattribute__


The basic process to lookup and call __getattribute__ is, in pseudocode:

def find_special(obj, name):
    for t in type(obj).__mro__:
        if name in t.__dict__:
            it = t.__dict__[name]
            __get__ = find_special(it, '__get__')
            if __get__:
                it = __get__(obj, type(obj))
            return it

def LOAD_ATTR(obj, index):
    name = f_code.co_names[index]
    __getattribute__ = find_special(obj, '__getattribute__')
    if not __getattribute__:
        raise AttributeError
    try:
        return __getattribute__(name)
    except AttributeError:
        __getattr__ = find_special(obj, '__getattr__')
        if not __getattr__:
            raise
        return __getattr__(name)

If you don't understand the __get__, read the Descriptor HowTo Guide.

(It shouldn't be possible to ever fail to find a __getattribute__, but there's code for it anyway--if you play around with the C API, you can try to call a method on a class before setting up its MRO, and you get an AttributeError rather than a segfault or something. On the other hand, __getattr__ can easily be missing, because object doesn't implement that, nor do most other builtin/stdlib types.)

I've skipped over a few things (like the code to handle types that aren't fully constructed yet without crashing, etc.), but that's the basic idea. And it's also how other special-method lookups, like the nb_add (__add__) lookup inside BINARY_ADD, work.

Notice that a __getattribute__ defined in the instance, or dynamically (by __getattribute__?), or a __getattribute__, __mro__, __class__, or __dict__ pushed onto the type by its metatype, all get ignored here.

But if you've overridden __getattribute__ on your object's type, or any of its supertypes, or done the equivalent in C, your code will get called (as a bound method--that's the point of the descriptor get bit).

One thing I skipped that may be worth noting is the method cache. See the macros at the top of typeobject.c, but the basic idea is that 4096 methods (that aren't special-cased away or given unreasonable names) get cached, so they can skip all the MRO-walking and dict lookups and jump right to the descriptor call.

Inside the default __getattribute__


Assuming you got to object.__getattribute__, what it does is pretty similar to special-method lookup all over again, but with some minor differences. The short version is that it allows instances, metaclass __getattribute__, and __getattr__ to get involved. In full detail, it's something like:

  • There's no type slot equivalent to "general name known only by a string passed in at runtime", so the per-type step does the tp_dict[name] bit.
  • Four-stage lookup instead of just the per-type MRO walk:
    1. Look on the type and its MRO for a data descriptor (that is, a descriptor with __set__). If so, return its __get__(obj, type(obj)).
    2. Look in the object itself only: if tp_dict (__dict__) and tp_dict[name] exist, you're done--do not call its __get__ method, just return it.
    3. Look on the type and its MRO for a non-data descriptor or non-descriptor. Return its __get__(obj, type(obj)) (or, if it's not a descriptor, just return it).
    4. Special-method lookup __getattr__, call that, and do the __get__(obj, type(obj)) on the result if present.

So, in pseudocode:

def __getattribute__(self, name):
    for t in type(obj).__mro__:
        try:
            desc = t.__dict__[name]
            if '__get__' in desc.__dict__ and '__set__' in desc.__dict__:
                return desc.__dict__['__get__'](obj, type(obj))
            break
        except AttributeError:
            pass
    if name in self.__dict__:
        return self.__dict__[name]
    try:
        if '__get__' in desc.__dict__:
            return desc.__dict__['__get__'](obj, type(obj))
        return desc
    except UnboundLocalError:
        __getattr__ = special_method(self, '__getattr__')
        if __getattr__: return __getattr__(name)
        raise AttributeError

I believe the point of that multi-stage lookup is to make it easy to shadow methods (which are non-data descriptors) but hard to accidentally shadow properties and similar things (which are data descriptors).

Notice that if the descriptor __get__ raises AttributeError, that will trigger the fallback to __getattr__ from the previous process.

What about classes?


If you look something up on a class, the class is the object, and the metaclass is the type. And if your metaclass doesn't override __getattribute__, type does (and it never raises or supers).

The only difference between type.__getattribute__ and object.__getattribute__ is at the descriptor steps, it calls __get__(None, cls). So, everything works the same, except for the descriptor get bit. (If you work things through, you should see how this, e.g., lets @classmethods work the same whether called on an instance or on the class, and how it lets metaclass methods act like classmethods.)

Other tricks


What else is left?

  • __slots__ works by creating a hidden array, and a set of descriptors that read and write to that array, and preventing a tp_dict from being created on the instance.
  • Methods don't do anything special--spam.eggs() is the same as tmp = spam.eggs; tmp() (unlike many other OO languages). They work because they're stored as plain-old functions in the class dict, and plain-old functions are descriptors that build bound methods. See How methods work if that isn't clear.
  • Everything else--@property, @staticmethod, etc.--is just a descriptor; object.__getattribute__ (or type.__getattribute__) finds it on the class, calls its __get__ method, and gives you the result. (It may help to work through how @classmethod produces a bound class method whether looked up on the type Spam.eggs or on the instance spam.eggs.)
  • There are various C API mechanisms to expose struct members, get/set function pairs, etc. as if they were Python attributes. But it's pretty obvious how these work unless you want the gory details, in which case they're in the C API docs.

What about super?


It uses four different tricks to be that super, thanks for asking:
  • In every class definition, __class__ is a local referring to the class currently being defined.
  • The compiler rewrites super() to super(__class__, co_varnames[0]). (The first local is the first parameter, so this handles self in normal methods, cls in classmethods and __new__ and normal metaclass methods, mcls in metaclass __new__ and metaclass classmethods, even if you decide to call them something confusing like s or this.)
  • Each method defined in a class gets __class__ as a free variable whenever it refers to super, not just whenever it directly refers to __class__.
  • super.__getattribute__ overrides the normal behavior from object to jump ahead on the MRO chain after the current __class__.
7

View comments

Blog Archive
About Me
About Me
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.