Core Python behavior
There are some ideas that are central enough to Python that changing them would give us an entirely new language, so we don't want to change them. The tl;dr for most of it is the basic LEGB rule, but there's a bit more to it than that:- Implicit declarations: assigning to a new name creates a new local variable.
- Explicit
nonlocal
declarations: afternonlocal spam
, assigning tospam
does not implicitly declare a new local variable; instead, it references the first variable namedspam
in an outer scope. - Explicit
global
declarations: afterglobal spam
, assigning tospam
does not implicitly declare a new local variable; instead, it references the variable namedspam
in the module's global scope. - Lexical scoping: free names are bound in lexically outer scopes, not in the dynamic runtime environment.
- Globals and builtins: the global scope of a module may be the outermost lexical scope in the source code, but builtins are treated as another scope even outside that.
- Local scopes are defined by function definitions (including
def
,lambda
, and the hidden definition inside comprehensions) and class definitions only (not by every block/suite). - Dynamic globals: names at the module global scope are always looked up by name at runtime. This allows us to build modules iteratively (compiling statement by statement--as is done for
__main__
, the top-level script or interactive REPL), modify their environments with statements likeexec
, and so on. (Note that we're talking about dynamic lookup within a lexical scope, not about dynamic scope. This isn't too important for globals, but it will matter later.)
Function locals
Whenever execution enters a new function call, a new context gets created. Conceptually, this is a mapping, that holds the callee's local variables. You can get the local context at any time by callinglocals
or vars
(and it's implicitly passed as the default value to any call to eval
or exec
).However, this isn't necessarily what Python actually uses. In particular, CPython stores the context as an array, by converting all local name lookups into array indices at compile time. So, if you try to change your local context (e.g.,
locals()["x"] = 3
or exec('x = 3')
), it will generally have no effect.This is different from the case with global variables. Code that accesses global variables does so through the same dict returned by
globals()
, and module-level code uses the global scope as its local scope.We can force things to work the same way at the local scope by converting every
LOAD_FAST
and STORE_FAST
into a LOAD_NAME
and STORE_NAME
, putting the names in co_names
instead of co_varnames
, and making sure that the locals
dict always gets created instead of only on demand (which, in CPython, you can do by making sure the usual CO_NEWLOCALS
flag is set but the usual CO_OPTIMIZED
is not). This is all something we could do in a decorator with some bytecode hacking.Closures
As mentioned above, Python compiles local variable references to indexes into an array of locals. So how do closures work? For any local accessed by an inner function, acell
object is stored in place of the value in the array, which is just an object with a single member value
. When that inner function is entered, it gets references to the same cells in its frame. Instead of LOAD_FAST
and STORE_FAST
you get LOAD_DEREF
and STORE_DEREF
, which, instead of accessing the cell itself, access the value inside the cell.When you call
locals
, if there are any freevars (cells inherited from an outer scope) or cellvars (cells exposed to an inner scope), you don't get the cell objects in the dictionary, but their values. Changing them has no effect even in cases where changing local variables does.If we changed local lookup to be by name, what would happen with closures? Well, I think the changes above would copy all names inward. Which works fine for immutable names, but if we're trying to rebind names in the outer scope (i.e., we've used
nonlocal
), or (I think) even if we're trying to pick up rebindings from the outer scope, it breaks.There might be a way to make this work easily using the
LOAD_CLASSDEREF
op, which is designed to look in a real locals dict (class suites don't use fastvars) and then fall back to consulting cells. But I'm not sure. Anyway, switching over to the next bit is the fun part, so let's charge forward assuming this isn't workable.The traditional Lispy solution to this is to make environments into something like a
ChainMap
instead of a dict
. That automatically fixes picking up rebindings from the outer scope, but it doesn't help for pushing rebindings back up. We need to have two separate operations for "add/modify this binding in the topmost dict" vs. "modify this binding in the topmost dict that already had such a binding". Then every LOAD_DEREF
turns into a LOAD_NAME
just like LOAD_FAST
did (it's now a chained lookup automatically by virtual of using ChainMap
), but STORE_DEREF
calls this new rebind-topmost-existing binding method instead of the usual set-in-top-dict method used by STORE_NAME
.If we wanted to still have cell objects (they are exposed to the Python level, and even mutable in 3.7+), I think they'd have to change for a simple value holder to a reference to a
ChainMap
together with a name.In fact, having done that, we could just leave
LOAD_DEREF
and STORE_DEREF
alone, and let the cell variables handle the indirection through the ChainMap
for us. Although I think that might be more confusing, not less.The other closure-related opcode is
LOAD_CLOSURE
, which loads a cell object so it can be passed farther down. I think nobody will need to access the loaded cell anymore unless someone inspects the closure in a frame object or the like, so we could just strip these out (or change them to load a const—something has to be on the stack…) if we're willing to break that. Alternatively, if we're building chain-map-referencing cell objects, we have to replace LOAD_CLOSURE
with code to build them.There might be other subtle semantic differences from the change to chained environments. On the other hand, it would make it a lot easier to experiment with different semantics from within Python. Creating new cell variables and new closures is very hard; creating a new chained environment to pass to something like
exec
is dead easy.One thing you could (I think) conceivably do with this is to generate your own chained environments and pass them into function code instead of using the default ones, which you could use to, e.g., explicitly use dynamic instead of lexical scoping for some particular call.
Globals
Once we have the same flexibility in locals and closures that we do in globals, is there any need for a special global scope anymore? We could just make it another environment at the bottom of the chain. If we wanted to replaceLOAD_GLOBAL
and STORE_GLOBAL
with code that uses the chain map, we'd need a way to go to the back of the chain rather than the front.But the simpler solution is to just keep passing around separate globals and locals (in functions and classes) and keep accessing globals the same as always. The fact that the global namespace is just a reference to the same dict at the back of the local namespace's chain wouldn't affect normal code, but it would be true for anyone who wants to play around.
Either way, not only can we now close over globals, we can even (if we decided to implement closure cells as indirections to the chain instead of scrapping them) construct and pass around cells for them. Closing over globals is something that nobody but Lisp fanatics even notices that Python can't do today, but hey, it can't. Of course we might want to preserve that limitation instead of fixing it, but I don't think we need to. The compiler should still barf on using
nonlocal
for something that's global instead of outer, and still generate LOAD_GLOBAL
ops for something that's global instead of outer with neither declaration, and so on, so existing code should just work the same way; it would just become possible, with a bit of monkeying, to pick up a global environment, pass it around as if it were a nonlocal one, and have everything magically work.Builtins
If we monkey with globals, builtins are a bit of a problem. Conceptually, the global namespace already works sort of like a chained environment with globals in front of builtins (and no way to directly store into the outer namespace). However, practically, it's a big pile of hacks. For example,eval
just takes locals and globals, no builtins. If the globals has a __builtins__
member, it's used as the builtin environment; otherwise, all of the globals at the scope of the eval
call (which normally includes the __builtins__
member, of course) are automatically copied into the passed-in globals first. Trying to make this not break anything could be a lot of fun.Also, of course, if we move builtins into a real chained environment outside of globals instead of the sort-of-like-that thing Python does today, the
LOAD_GLOBAL
and STORE_GLOBAL
are no longer going to the end of the chain, but to one level before the end, which is getting a little too special to be readable as a special case. But I don't think we wanted to replace the global ops anyway. And, if not, we can just leave builtins alone.Classes
Code inside a class suite gets executed the same way as top-level code (as opposed to function code), except that instead of having the enclosing global namespace for both its globals and locals, it has the enclosing global and a copy of the local namespace. (Also, some extra values are crammed into the locals to allowsuper
to work, set up __module__
, etc., but I think the compiler does that just by emitting some explicit STORE_NAME
stuff at the top of the code.) When the suite is finished executing, the metaclass gets called with the class name, base classes, and the local environment.As mentioned above, classes actually use the locals dict, not a fast array on the frame like functions, so there's no need to change anything there. As for using
LOAD_CLASSDEREF
in place of LOAD_DEREF
, changing that to LOAD_NAME
with a chained environment should just work.Summary
Basically, what (I think) we could change without breaking everything is:- Replace the locals dictionary with a chain of dictionaries (with an extra method to replace rather than insert).
- Make the locals dictionary actually live and mutable even inside function calls.
- Replace all fast locals with existing ops that go to the locals dictionary.
- Replace closure cell loads with the same existing op.
- Replace closure cell stores with calls to the new replace method (or a new opcode that does that).
- Wrap up exec-like functions to chain globals onto the end of locals.
- Works like normal Python for 99.9% of code.
- Is significantly slower (fast locals and closure cells are an optimization, as the name "fast" implies, and we'd be tossing them).
- Exposes a bit of new functionality that might be fun to play with.
- Is significantly easier to understand, except that nobody could really try to learn it until they already knew how the more complicated normal Python rules work.
View comments