But to understand how the hooking works, you need to go under the covers to see how that normal lookup actually happens.
When I say "Python" below, I'm mostly talking about CPython 3.5. Other implementations vary to differing degrees in how they implement things; they just have to result in the same rules being followed.
Variables: LEGB
Python has simple lexical scoping, like most other languages. (Unlike many languages, only module, function, and class definitions create new scopes, not every suite, but that's not too important here. Also, remember that lambda expressions and comprehensions are function definitions, and thus create new scopes, despite not having suites.)
The specific rule Python uses for lookup is LEGB: Local, Enclosing, Global, Builtin. In other words, when you write
x
, Python looks for a local variable named x
then (if you were in a nested function or class) it goes through all the enclosing scopes' local variables, then it looks at the module's globals, and then it looks at builtins.Meanwhile, because Python has implicit declarations (any assignment may be declaring a new variable), it needs a rule for that: any assignment creates a local variable (shadowing any enclosing, global, or builtin of the same name), unless you have a
nonlocal
or global
declaration in the same scope.The way Python implements these rules is not as simple as you'd think. Primarily, this is for reasons of optimization.
tl;dr
The short version is:
* Locals (except when shared with a closure) are loaded and stored in an array on the stack frame.
* Nonlocals (and locals shared with a closure) are basically the same but with an extra dereference.
* Globals are loaded and stored in a dict.
* Builtins are loaded via two or three dict lookups (and are never directly stored).
Fast locals
Local variables are stored in an array on the stack frame, and loads and stores are compiled into array-indexing operations on that array. This is basically the same behavior as C--and it's a lot faster than going through a dictionary.
Exceptions to this rule:
* Modules don't have fast locals; they use the globals dict as their locals, so everything works the same as for the globals section below.
* Local variables that are shared with a nested closure are slightly more complicated; see the next section.
So, how does Python know how to compile your locals to array indices? And how does it turn that back into the local variable names that you can see in tracebacks,
locals()
, inspect
and dis
functions, etc.?When compiling the body of your function, Python keeps track of all of your variable assignments. (Assignments to attributions or subscriptions don't count; those are just calls on the attributed or subscripted object.) Function parameters are local, any variable that you assign to is local (unless you have a
global
or nonlocal
statement for it); anything else is not.So, it creates a tuple of all those local names, and stores it in the
co_varnames
of the code object. It also stores the total count in co_nlocals
. Then, when compiling your code to bytecode, every load or save to co_varnames[3]
becomes a LOAD_FAST 3
or STORE_FAST 3
.When a function is called, the stack frame reserves
co_nlocals
extra space at the end, called f_localsplus
, and when the interpreter sees a LOAD_FAST 3
, it just reads from frame.f_localplus[3]
.For example, in this code:
def spam(eggs): cheese = 0 return eggs + cheese
...
eggs
is local because it's a parameter, and cheese
is local because you assign to it, so that last line will get compiled to, in effect, LOAD_FAST(0) + LOAD_FAST(1)
, not LOAD_NAME('eggs') + LOAD_NAME('cheese')
. Now you can probably understand why these optimizations work:
def spam(eggs=eggs): local_cheese=cheese for _ in range(1000000000): eggs(local_cheese()))
You've replaced a billion dictionary lookups for the globals
eggs
and cheese
with one dictionary lookup for each plus a billion array lookups for f_localsplus[0]
and f_localsplus[1]
, which is obviously faster. (Dictionary lookups are, of course, constant-time, just like array lookups. But with a larger constant multiplier--large enough to make a significant difference here.)But how does Python get the names back out? Well, as mentioned, they're stored in
co_varnames
. And when a function is called, the stack frame gets a pointer to the function's code
object in f_code
So, if it needs to build a traceback or disassembly or whatever for fast #1, it just gives you frame.f_code.co_varnames[1]
.What about
locals()
? Well, there actually is a locals dict on the stack frame, called f_locals
. But it doesn't get created unless you ask for it (e.g., by calling locals()
, or just by asking for the Python version of the frame object with, e.g., sys._getframe()
). This calls a function PyFrame_FastToLocals
, which effectively does frame.f_locals = dict(zip(frame.f_code.co_varnames, frame.f_localsplus))
, and then it returns that f_locals
to you. It should be obvious why, as the docs say, modifying locals()
doesn't affect the function's actual locals: it's just a snapshot of your locals. (There is actually a function PyFrame_LocalsToFast
, but you can only call that from C, not Python.)What about
exec()
? Again, as the docs say, this function by default works on (something equivalent to) it's caller's globals()
and locals()
, so, again, it can't change your local variables.I used Python-like pseudocode for all the stuff that happens in C under the covers. But most of that actually is exposed to Python. For example, if you call
sys._getframe()
, the frame
object you get back has an f_code
member. Or, if you just look at a function, it has a __code__
member. And either way, that object has a co_varnames
. And so on. The f_localsplus
member, and the PyFrame_FastToLocals
and PyFrame_LocalsToFast
are the only things mentioned here that aren't exposed. And even there, you can always do this:ctypes.pythonapi.PyFrame_LocalsToFast.argtypes = [ctypes.py_object, ctypes.c_int] ctypes.pythonapi.PyFrame_LocalsToFast.restype = None ctypes.pythonapi.PyFrame_LocalsToFast(sys._getframe(), 0)
Have fun with that.
Free and cell variables
Variables from an enclosing scope are almost the same as locals--they're handled by looking them up in an array--but with an extra deference. But that requires a bit of extra work to set up.
So, how does Python know how to compile your nonlocals to array indices?
As we saw above, when Python sees a variable, it knows whether that variable is local to your function. If it isn't, Python steps through the outer enclosing functions to see if it's local to any of them. If so, it's a free variable. (Of course a
global
statement means it's a global variable, so it doesn't look through enclosing scopes, and a nonlocal
means it's a free variable even if there's a local assignment.)Notice that the module's global scope is not counted as a scope for these purposes--anything found in the global scope is always global, not enclosing.
So anyway, the compiler creates a tuple of all those free names, and stores it in
co_freevars
. Then, when compiling your code to bytecode, every load or save to co_freevars[2]
becomes a LOAD_DEREF 2
or STORE_DEREF 2
.Now, when a function is called, the stack frame doesn't just reserve
co_nlocals
space, but co_nlocals + len(co_freevars)
extra space at the end (I'm still lying to you here; see a few paragraphs down for the truth) in f_localsplus
, and when the interpreter sees a LOAD_DEREF 2
, it just reads from frame.f_localsplus[frame.f_code.co_nlocals + 2]
.Well, not quite. That obviously wouldn't allow you to reassign closure variables. (You could call mutating methods, but assignment would just break the connection, just like after
a = b = [1, 2, 3]
, a.append(4)
affects b
, but a = [4]
breaks the connection.) And that would defeat the entire purpose of nonlocal
.So, what you actually have in that slot isn't the object itself, but a
cell
object that has a member named cell_contents
that's the actual object. And what the outer scope has in its fast locals slot is the same cell
object, not the thing it points to. So, if either the inner function or the outer function assigns to the variable, they're not assigning to one of their fast local slots, but assigning to the cell_contents
of a cell in one of their fast local slots, so they can each see what the other one does.That's why local variables that are referenced by a nested scope work like nonlocal variables. They're stored in
co_cellvars
instead of co_freevars
. Your frame's actual space is co_nlocals + len(co_cellvars) + len(co_freevars)
. The name that goes with a LOAD_DEREF 2
is equivalent to (co_cellvars+co_freevars)[2]
. At runtime, that LOAD_DEREF 2
actually does a f_localsplus[co_nlocals+2].cell_contents
. When you construct a nested function at runtime, there's zero or more LOAD_CLOSURE
bytecodes to push the current frame's cells onto the stack, then a MAKE_CLOSURE
instead of MAKE_FUNCTION
, which does the extra step of popping those cells off and stashing them in the function's __closure__
attribute. And when you call a function, its __closure__
cells get copied into the f_localsplus
array. Meanwhile, even if your function doesn't access any variables from an enclosing scope, if a function nested within your function does so, they're free variables for you, and have to get put in your __closure__
so they can end up in your stack frame so they can get to your nested function's __closure__
and stack frame.And I believe that's all of the actual details correct.
But you don't have to memorize all this stuff, or go dig it out of the C code, or work it out by trial and error, or plow your way through that horrible paragraph ever again; the docs for the
dis
module describe what each opcode does, and have improved tremendously over the past few versions of Python. If you can't remember exactly what LOAD_DEREF
looks up and where, just look at the docs: "Loads the cell contained in slot i of the cell and free variable storage. Pushes a reference to the object the cell contains on the stack." That might not have explained everything to you if you'd never read anything before, but it should be more than enough to refresh your memory once you have.Also, again, most of this is exposed to Python, so the Python-esque pseudocode above is almost all real code. However, from Python, you can't write to a cell through its
cell_contents
, and you can't construct new cell
instances. (If you want to have fun with ctypes.pythonapi
, you can probably figure it out from here.)Finally,
locals()
, in addition to giving you a snapshot of your locals rather than a live way to change them, also flattens out any cells, giving you their cell_contents
instead.At first glance, it seems like it would be simpler to just have each function's stack frame have a pointer to its outer function's stack frame. You could still optimize things just as much by having instructions like
LOAD_OUTER 3, 2
to go 3 steps out and get the fast-local 2 from that frame, right? In fact, some languages do implement closures that way. But that means that the a function's stack frame can't go away--and, therefore, neither can anything else it refers to--until all closures referring to any of its variables, or any of its outer scopes, goes away. And that "anything else it refers to" includes the stack frame of its caller. You can see why this could be a problem. There are ways to optimize that (although you still run into problems with coroutines), but it's actually simpler, not more complicated, to just reference the individual values pulled off the frame, as Python does.Globals and builtins
Globals are easy. If the compiler sees a name that isn't assigned in the current function or an outer nested function, or that has a
global
statement, it's global. There's no special operation for builtins; it's just a fallback for every global lookup.The compiler does still use an array here, but only to fetch the name from a constant tuple on the
code
object (co_names
) to look up with LOAD_GLOBAL
and STORE_GLOBAL
at runtime. They look something like this:def LOAD_GLOBAL(index): name = frame.f_code.co_names[index] try: return frame.f_globals[name] except KeyError as e: try: builtins = frame.f_globals['__builtins__'] if isinstance(builtins, ModuleType): builtins = builtins.__dict__ return builtins[name] except KeyError as f: raise e def STORE_GLOBAL(index, value): name = frame.f_code.co_names[index] frame.f_globals[name] = value
That
f_globals
will normally be the actual globals for the module the current (module-level, function, or class) code was defined in, but it could be something else. (For example, exec
lets you pass in any mapping for globals
; if you want a sanitized and quasi-sandboxed environment with your own under-the-hood functions and builtins like exec
and __import__
stripped out, that's easy.)Much simpler than closure variables, right? But it can be slower. So, why have a special case for globals instead of just making it the last stop on the closure chain (or making builtins the last stop instead of a special case within the globals special case)?
Well, for one thing, not having fast locals or cells and doing everything through the globals dict allows for more dynamic stuff, like creating new variables in
exec
and using them in the same scope. For another, not having to make a list of all of the names at scope, simply adding them to the dict as they're created, means that the global scope can be executed incrementally, statement by statement (so it works the same as in the REPL--in fact, both the REPL and the top-level script work by just incrementally building a module named __main__
). In particular, you can define a function that uses a global, and start passing it around, before that global has been created.More importantly, lookup speed shouldn't really matter at global level. If you're doing any heavy computation that involves millions of name lookups at module scope, move it into a function and call that function. And if you're using those globals millions of times from another function, get them out of global scope too. Those are "structured programming 101" anyway; why try to optimize things that people shouldn't be doing?
Well, except for the fact that your module's top-level functions and classes are globals, and it's perfectly reasonable to use them millions of times. In less dynamic languages, they're only looked up as globals at compile time, so they aren't like runtime global variables--but in Python, functions, types, and everything else are first-class values, and they're looked up the same way as any other global variables. In practice, this usually isn't a bottleneck for most programs; when Python is too dynamic and slow for the inner code of some project, there are usually other factors that matter much more than this dict lookup. But occasionally, it is. So, this is one of those places people keep looking for new optimizations. At the user level, you can always just copy the global to a local or a default parameter value, as mentioned earlier. Or you can use an ambitious optimizer that tries to automate the same benefit (like FAT Python, which creates an extra version of each function's
__code__
with globals copied into the co_consts
, and wrapping the function in a guard that checks that the cached globals haven't changed since the function was compiled and, if so, calling the original "slow" function instead of the fast version).What about classes?
A class definition is a scope, just like a function definition. There are a couple of minor differences (the special
LOAD_CLASSDEREF
opcode, the magic __class__
cell that's created in class scope so methods can access it directly or via super
, etc.). But the really big difference is that the class code isn't executed every time the class is called; it's executed once to produce the dict that gets passed to type
(or another metaclass), and then it, and its scope, goes away. (Of course the body of its __new__
and/or __init__
will get executed when the class is called.)I don't think there's anything important to learn about classes nested in functions or classes (or classes nested in functions, unless you want to know how
super
works) that isn't obvious once you play with it.Attributes
Attribute lookup seems really simple at first: Every attribute lookup is compiled to
LOAD_ATTR
, which uses a string taken from the same co_names
array as LOAD_GLOBAL
, and looks it up in the object's __dict__
. Or, if it's not found there, in the object's type's dict. Or, if not there, then each type on that type's MRO. If that all fails, __getattr__
is looked up (but using special-method lookup, not normal attribute lookup) and called. Oh, unless __getattribute__
was overridden, in which case whatever it does happens instead. Also, somehow functions get magically turned into bound methods. And @property
, and __slots__
and... and how does it find the __dict__
in the first place? And what about C-API types? Maybe it's not so simple after all...It's actually pretty simple: the only rule in the interpreter is special-method lookup, and
LOAD_ATTR
only ever uses that rule to look up and then call __getattribute__
. (Other special methods are handled internally by their opcodes or other special builtin or extension functions written in C.) Everything else happens inside __getattribute__
; we'll get to that later.Special method lookup
Special method lookup is only somewhat vaguely documented, in the data model reference chapter. See
call_method
and its helpers in the source for details.The basic idea of special method lookup is that it ignores the instance dictionary, and most kinds of dynamic customization.
You may wonder what the point of special method lookup is. Obviously
__getattribute__
lookup can't be hooked by __getattribute__
, but why should it also skip the instance, or treat metaclasses differently? And why should other special methods like __add__
follow the same rules? (If you wanted to add per-instance methods to a value, __add__
or __lt__
would probably be the first things you'd think of.)In fact, the original "new-style class" design for Python 2.3 didn't have special method lookup--but (long before the first beta), Guido discovered a problem: Some methods make sense both on normal objects, and on types. For example,
hash(int)
makes just as much sense as hash(3)
. By normal lookup rules, both call int.__hash__
. By special method lookup rules the former skips int
's dict and instead calls type.__hash__
. And a few other methods have the same problem, like str
and repr
. And what would happen if a type had both __call__
and __new__
? And, even those that don't make sense for types--int + str
, abs(Counter)
, or -socket
--might make sense for some subclass of type
(e.g., the new optional static types use subscripting, like Iterable[int]
, for generics).So, that's why we have special method lookup, and it obviously solves the
__getattribute__
problem.How do we find the __dict__
without finding the __dict__
first?
There's still a circular definition problem here. How do we find an attribute? By calling the
__getattribute__
attribute on the object's type and walking its MRO looking in the __dict__
of each type. So... how do we get that __dict__
? Certainly not by searching __dict__
s. And the process also involves a variety of other attribute lookups--you get an object's type via its __class__
, a type's MRO via its __mro__
, and so on.The secret is that none of these are actual attribute lookups. In the C struct representing each object, there's an
ob_type
pointing to the type. In the C struct representing each type object, there's a tp_mro
pointing to the MRO, and a tp_getattro
pointing to the __getattribute__
, and a tp_dict
pointing to the __dict__
, and so on.But you can set
__class__
or __dict__
or __getattribute__
from Python. How does that work? In general, when you overwrite one of these special values, the interpreter updates the corresponding C struct slot. (There are a few cases where it does different things, like using a slot as a cache and searching the dict if it's NULL, but that works too.)So, in the pseudocode below, when I write
type(obj).__mro__
, that's really following the ob_type
slot from the object, then the tp_mro
slot from the type, and so on.Finding __getattribute__
The basic process to lookup and call
__getattribute__
is, in pseudocode:def find_special(obj, name): for t in type(obj).__mro__: if name in t.__dict__: it = t.__dict__[name] __get__ = find_special(it, '__get__') if __get__: it = __get__(obj, type(obj)) return it def LOAD_ATTR(obj, index): name = f_code.co_names[index] __getattribute__ = find_special(obj, '__getattribute__') if not __getattribute__: raise AttributeError try: return __getattribute__(name) except AttributeError: __getattr__ = find_special(obj, '__getattr__') if not __getattr__: raise return __getattr__(name)
If you don't understand the
__get__
, read the Descriptor HowTo Guide.(It shouldn't be possible to ever fail to find a
__getattribute__
, but there's code for it anyway--if you play around with the C API, you can try to call a method on a class before setting up its MRO, and you get an AttributeError
rather than a segfault or something. On the other hand, __getattr__
can easily be missing, because object
doesn't implement that, nor do most other builtin/stdlib types.)I've skipped over a few things (like the code to handle types that aren't fully constructed yet without crashing, etc.), but that's the basic idea. And it's also how other special-method lookups, like the
nb_add
(__add__
) lookup inside BINARY_ADD
, work.Notice that a
__getattribute__
defined in the instance, or dynamically (by __getattribute__
?), or a __getattribute__
, __mro__
, __class__
, or __dict__
pushed onto the type by its metatype, all get ignored here.But if you've overridden
__getattribute__
on your object's type, or any of its supertypes, or done the equivalent in C, your code will get called (as a bound method--that's the point of the descriptor get bit).One thing I skipped that may be worth noting is the method cache. See the macros at the top of typeobject.c, but the basic idea is that 4096 methods (that aren't special-cased away or given unreasonable names) get cached, so they can skip all the MRO-walking and dict lookups and jump right to the descriptor call.
Inside the default __getattribute__
Assuming you got to
object.__getattribute__
, what it does is pretty similar to special-method lookup all over again, but with some minor differences. The short version is that it allows instances, metaclass __getattribute__
, and __getattr__
to get involved. In full detail, it's something like:- There's no type slot equivalent to "general name known only by a string passed in at runtime", so the per-type step does the
tp_dict[name]
bit. - Four-stage lookup instead of just the per-type MRO walk:
- Look on the type and its MRO for a data descriptor (that is, a descriptor with
__set__
). If so, return its__get__(obj, type(obj))
. - Look in the object itself only: if
tp_dict
(__dict__
) andtp_dict[name]
exist, you're done--do not call its__get__
method, just return it. - Look on the type and its MRO for a non-data descriptor or non-descriptor. Return its
__get__(obj, type(obj))
(or, if it's not a descriptor, just return it). - Special-method lookup
__getattr__
, call that, and do the__get__(obj, type(obj))
on the result if present.
So, in pseudocode:
def __getattribute__(self, name): for t in type(obj).__mro__: try: desc = t.__dict__[name] if '__get__' in desc.__dict__ and '__set__' in desc.__dict__: return desc.__dict__['__get__'](obj, type(obj)) break except AttributeError: pass if name in self.__dict__: return self.__dict__[name] try: if '__get__' in desc.__dict__: return desc.__dict__['__get__'](obj, type(obj)) return desc except UnboundLocalError: __getattr__ = special_method(self, '__getattr__') if __getattr__: return __getattr__(name) raise AttributeError
I believe the point of that multi-stage lookup is to make it easy to shadow methods (which are non-data descriptors) but hard to accidentally shadow properties and similar things (which are data descriptors).
Notice that if the descriptor
__get__
raises AttributeError
, that will trigger the fallback to __getattr__
from the previous process.What about classes?
If you look something up on a class, the class is the object, and the metaclass is the type. And if your metaclass doesn't override
__getattribute__
, type
does (and it never raises or super
s).The only difference between
type.__getattribute__
and object.__getattribute__
is at the descriptor steps, it calls __get__(None, cls)
. So, everything works the same, except for the descriptor get bit. (If you work things through, you should see how this, e.g., lets @classmethod
s work the same whether called on an instance or on the class, and how it lets metaclass methods act like classmethods.)Other tricks
What else is left?
__slots__
works by creating a hidden array, and a set of descriptors that read and write to that array, and preventing atp_dict
from being created on the instance.- Methods don't do anything special--
spam.eggs()
is the same astmp = spam.eggs; tmp()
(unlike many other OO languages). They work because they're stored as plain-old functions in the class dict, and plain-old functions are descriptors that build bound methods. See How methods work if that isn't clear. - Everything else--
@property
,@staticmethod
, etc.--is just a descriptor;object.__getattribute__
(ortype.__getattribute__
) finds it on the class, calls its__get__
method, and gives you the result. (It may help to work through how@classmethod
produces a bound class method whether looked up on the typeSpam.eggs
or on the instancespam.eggs
.) - There are various C API mechanisms to expose struct members, get/set function pairs, etc. as if they were Python attributes. But it's pretty obvious how these work unless you want the gory details, in which case they're in the C API docs.
What about super
?
It uses four different tricks to be that super, thanks for asking:
- In every class definition,
__class__
is a local referring to the class currently being defined. - The compiler rewrites
super()
tosuper(__class__, co_varnames[0])
. (The first local is the first parameter, so this handlesself
in normal methods,cls
in classmethods and__new__
and normal metaclass methods,mcls
in metaclass__new__
and metaclass classmethods, even if you decide to call them something confusing likes
orthis
.) - Each method defined in a class gets
__class__
as a free variable whenever it refers tosuper
, not just whenever it directly refers to__class__
.
super.__getattribute__
overrides the normal behavior fromobject
to jump ahead on the MRO chain after the current__class__
.
View comments