Many languages have a for-each loop. In some, like Python, it’s the only kind of for loop:

for i in range(10):
    print(i)

In most languages, the loop variable is only in scope within the code controlled by the for loop,[1] except in languages that don’t have granular scopes at all, like Python.[2]

So, is that i a variable that gets updated each time through the loop or is it a new constant that gets defined each time through the loop?

Almost every language treats it as a reused variable. Swift, and C# since version 5.0, treat it as separate constants. I think Swift is right here (and C# was right to change, despite the backward-compatibility pain), and everyone else is wrong.

However, if there is a major language that should be using the traditional behavior, it’s Python.

Loops and closures

In any language that has lexical closures that capture variables, you’ll run into the foreach-lambda problem. This has been known in Python (and JavaScript, and Ruby, etc.) for decades, and is even covered in Python’s official Programming FAQ, but every new programmer discovers it on his own, and every new language seems to do so as well, and it always takes people by surprise.

Consider this:

powers = [lambda x: x**i for i in range(10)]

This creates 10 functions that return x**0, x**1, … x**9, right?

If for-each works by re-using a variable, as it does in Python (and JavaScript, …) then no, this creates 10 functions that all return x**9 (or, in some languages, x**10). Each function returns x**i, where i is the variable from the scope the function was defined in. At the end of that scope, i has the value 9.

The problem has nothing to do with lambdas or comprehensions.[3] You’ll get the same behavior this way:

powers = []
for i in range(10):
    def func(x):
        return x**i
    powers.append(func)

Whenever people run into this problem, in any language, they insist that the language “does closures wrong”–but in fact, the language is doing closures perfectly.

So, does that mean all those users are wrong?

Traditional solution

In most languages, you can solve the problem by wrapping the function definition inside another function that you define and call:

for i in range(10):
    def make_func(j):
        def func(x):
            return x**j
        return func
    powers.append(make_func(i))

This works because the j parameter gets a reference to the value of the i argument, not a reference to the i variable itself. Then, func gets a reference to that j variable, which is different each time.

The problem with this solution is that it’s verbose, and somewhat opaque to the reader. It’s a bit less verbose when you can use lambdas, but arguably it’s even less readable that way:

for i in range(10):
    powers.append((lambda j: (lambda x: x**j))(i))

In some languages, including Python, there’s a simpler way to do this:

for i in range(10):
    def func(x, j=i):
        return x**j
    powers.append(func)

That’s because the default parameter value j gets a reference to the value of i in the defining scope, not a closure cell capturing the variable i.

Of course this is a bit “magical”, but people quickly learn to recognize it in Python. In fact, most experienced Python developers will spell it i=i instead of j=i, because (once you recognize the idiom) that makes it clear why we’re using a parameter with a default variable here: to get the value of i at the time of definition (or, in some cases, as an optimization–e.g., you can use len=len to turn len into a local variable instead of a builtin within the function, which makes it a bit faster to look up each time you call it).

The bigger problem is that it makes j a parameter, which means you can override the default value with an argument. For example, powers[3](2, 10) gives you 1024, which is almost certainly not something anyone wanted. You can do tricks like making it keyword-only, prefixing it with an underscore, etc. to try to discourage people from accidentally passing an argument, but it’s still a flaw.

Terminology sidebar

In Python, this issue is often described in terms of late vs. early binding. There are actually three places things could get bound: compile time (like constants), function definition time (like default parameter values), or function call time (like free variables captured by a closure). In many discussions of late vs. early binding, the distinction is between run time and compile time, but in this case, it’s between the later and earlier run time cases: free variables are bound late, at call time, while default parameter values are bound early, at definition time. So, capturing a nonlocal is late binding; the “default-value trick” of passing the value as the default value of an extra parameter means using early binding instead.

A different way of solving the same problem is to consider capture by variable vs. capture by value. In C++, variables are lvalues, but lvalues are also things you can pass around and take references to. You can write a function that takes an int, or one that takes a reference to an int, spelled int& (and another that takes a reference to a const int, and one that takes an rvalue-reference to an int, with complicated rules about how you collapse an lvalue reference to an rvalue reference and how overload distinguishes between an int and a reference to const int when the lvalue is constant, and…). And closures are built on top of this: there’s no special “closure cell” or “closure environment”; closures just have (lvalue) references to variables from the outer scope. Of course taking a reference to something doesn’t keep it alive, because that would be too easy, so returning a function that closes over a local variable means you’ve created a dangling reference, just like returning a pointer to a local variable. So, C++ also added capture by value: instead of copying an lvalue reference (which gives you a new reference to the same lvalue), you can copy an rvalue reference (which steals the reference if it’s a temporary that would have otherwise gone away, or copies the value into a new lvalue if not). As it turns out, this gives you another way to solve the loop problem: if you capture the loop variable by value, instead of by variable, then of course each closure is capturing a different value. Confused? You won’t be, after this week’s episode of C++.

Anyway, you can now forget about C++. In simpler languages like Java, C#, JavaScript, and Python, people have proposed adding a way to declare capture by value for closures, to solve the loop problem. For example, borrowing C++ syntax (sort of):

for i in range(10):
    def func[i](x):
        return x**i
    powers.append(func)

Or, alternatively, one of these proposals:

for i in range(10):
    def func(x; i):
        return x**i
    powers.append(func)

for i in range(10):
    def func(x) sharedlocal(i):
        return x**i
    powers.append(func)

for i in range(10):
    def func(x):
        sharedlocal i
        return x**i
    powers.append(func)

No matter how you spell it, the idea is that you’re telling func to capture the current value of the i local from the enclosing scope, rather than capturing the actual variable i.

If you think about it, that means that capture by value in a language like Python is exactly equivalent to early binding (in the “binding at definition time” sense). And all of these solutions do the exact same thing as the parameter default-value trick i=i, except that i is no longer visible (and overridable) as a parameter, so it’s no longer really a “trick”. (In fact, we could even hijack the existing machinery and store the value in the function object’s __defaults__, if we just add a new member to the code object to keep a count of captured local values that come after all the parameters.)

While we’re talking terminology, when discussing how parameters get passed in the previous section, is that “pass by value” or “pass by reference”? If you think that’s a good question, or you think the problem could be solved by just coming up with a new name, see this post.

Solutions

Since the traditional solution works, and people obviously can learn it and get used to it, one solution is to just do nothing.

But if we are going to do something, there are two obvious solutions:

  1. Provide a way for closures to use early binding or capture by value. (Again, these options are equivalent in languages like Python.)
  2. Provide a way for for-each loops to define a new variable each time through the loop, instead of reusing the same variable.

Either one would solve the problem of closures capturing loop variables. Either one might also offer other benefits, but it’s hard to see what they might be. Consider that, in decades of people using the default-value trick in Python, it’s rarely used (with locals[4]) for anything but loop variables. And similarly, you can find FAQ sections and blog posts and StackOverflow questions in every language discussing the loop problem, and none of them mention any other cases.

The first one obviously has to be optional: you still want closures to capture some (in fact, most) variables, or they really aren’t closures. And there’s no obvious way to make an automatic distinction, so it has to be something the user spells explicitly. (As shown in the previous section, there are plenty of alternative ways to spell it, each with room for more bikeshedding.)

The second one could be optional–but does it have to be? The C# team carefully considered adding a new form like this (in Python terms):

for new i in range(10):
    def func(x):
        sharedlocal i
        return x**i
    powers.append(func)

But they decided that any existing code that actually depends on the difference between for i and for new i is more likely to be buggy than to be intentionally relying on the old semantics, so, even from a backward-compatibility point of view, it’s still better to change things. Of course that had to be balanced with the fact that some people write code that’s used in both C# 4 and C# 5, and having it do the right thing in one version and the wrong in the other is pretty ugly… but even so, they decided to make the breaking change.

I’m going to examine the arguments for each of the two major alternatives, without considering any backward compatibility issues.

Consistency with C-style for

This one isn’t relevant to languages like Python, which don’t have C-style for loops, but many languages do, and they almost always use the same keyword, or closely related ones, to introduce C-style for loops and for-each loops. So, superficially, it seems like they should be as similar as possible, all else being equal.

In a C-style for loop, the loop variable is clearly a variable whose lifetime lasts for the entire loop, not just a single iteration. That’s obvious from the way you define the loop:

for (int i=0; i != 10; i++)

That third part isn’t generating a value for a new constant (that also happens to be named i), it’s mutating or rebinding a variable named i. That’s what the ++ operator means. If you change to i + 1, you get an infinite loop, where i stays 0 forever.

And this is why it’s usually legal to modify the loop variable inside the loop. It allows you to do things like this:

for (int i=0; i != spam.len(); ++i) {
    if (spam[i].has_arg) {
        process_with_arg(spam[i], spam[i+1]);
        ++i; // skips the next spam; we already used it
    } else {
        process_no_arg(spam[i]);
    }
}

So, shouldn’t for-each loops be consistent with that?

I don’t think so. Most languages already consider it legal and reasonable and only slightly unusual to modify the loop variable of a C-style for, but not to modify the loop variable of a for-each. Why is that? I think it’s because the two loops are semantically different. They’re not the same thing, just because they share a keyword.

So, at first glance this seems like an argument for #1, but in fact, it’s not an argument either way. (And for a new language, or an existing language that doesn’t already have C-style for loops, really, you don’t want C-style for loops…)

Consistency with functions

Anyone who’s done any functional programming has noticed that statement suites in imperative languages are a lot like function bodies. In fact, most control statements can be converted to a function definition or two, and a call to a higher-order function:

if spam:
    do_stuff()
    do_more_stuff()
else:
    do_different_stuff()
    
def if_body():
    do_stuff()
    do_more_stuff()
def else_body():
    do_different_stuff()
if_func(spam, if_body, else_body)

while eggs:
    eat(cheese)
    
def while_cond():
    return eggs
def while_body():
    eat(cheese)
while_func(while_cond, while_body)

But that doesn’t work with for-each:

powers = []
for i in range(10):
    def func(x):
        return x**i
    powers.append(func)

powers = []
def for_body(i):
    def func(x):
        return x**i
    powers.append(func)
for_each(range(10), for_body)

And this makes the problem obvious: that i is an argument to the for_body. Calling a function 10 times doesn’t reuse a single variable for its parameter; each time, the parameter is separate thing.

This is even more obvious with comprehensions, where we already have the equivalent function:[5]

powers = [lambda x: x**i for i in range(10)]

powers = map(lambda i: (lambda x: x**i), range(10))

Also consider Ruby, where passing blocks to higher-order-functions[6] is the idiomatic way to loop, and for-each loops are usually described in terms of equivalence to that idiom, and yet, the following Ruby 1.8[7] code produces 10 lambdas that all raise to the same power:

powers = []
10.times do |i|
    powers.push(lambda |x| { x**i })
end

So, this is definitely an argument for #2. Except…

Python

This is where Python differs from the rest of the pack. In most other languages, each suite is a scope.[8] This is especially obvious in languages like C++, D, and Swift, which use RAII, scope-guard statements, etc. where Python would use a with statement (or a try/finally). In such languages, the duality between scopes and functions is much clearer. In particular, if i is already going to go out of scope at the end of the for loop, it makes sense for each individual i to go out of scope at the end of an iteratiion.

But in Python, not only does i not go out of scope at the end of the for loop, there are comfortable idioms involving using the loop variable after its scope. (All such cases could be written by putting the use of the loop variable into the else clause of a for/else statement, but many novices find for/else confusing, and it’s not idiomatic to use it when not necessary.)

In addition, one of the strengths of Python is that it has simple rules, with few exceptions, wherever possible, which makes it easier to work through the semantics of any code you read. Currently, the entire semantics of variable scope and lifetime can be expressed in a few simple rules, which are easy to hold in your head:

  • Only functions and classes have scopes. ** Comprehensions are a hidden function definition and call.
  • Lookup goes local, enclosing, global, builtin.
  • Assignment creates a local variable. ** … unless there’s an explicit global or nonlocal statement.
  • Exceptions are unbound after an except clause.

Adding another rule to this list would mean one more thing to learn. Of course it would also mean not having to learn about the closure-over-loop-variable problem. Of course that problem is a natural consequence of the simple rules of Python, but, nevertheless, everyone runs into it at some point, has to think it through, and then has to remember it. Even after knowing about it, it’s still very easy to screw up and accidentally capture the same variable in a list of functions. (And it’s especially easy to do so when coming back to Python from Swift or C#, which don’t have that problem…)

Mutable values

Let’s forget about closures now, and consider what happens when we iterate over mutable values:

x = [[], [], []]
for i, sublist in enumerate(x):
    sublist.append(i)

If you’re not familiar with Python, try it this way:

x = [[], [], []]
i = 0
for sublist in x:
    sublist.append(i)
    i += 1

What would we expect here? Clearly, if this is legal, the result should be:

x = [[0], [1], [2]]

And there’s no obvious reason it shouldn’t be legal.

So, each sublist isn’t really a constant, but a variable, right? After all, it’s mutable.

Well, yes, each sublist is mutable. But that’s irrelevant to the question of whether the name sublist is rebindable. In languages where variables hold references to values, or are just names for values, like Java (except for native types) or Python, there’s a very obvious distinction here:

a = [1, 2, 3]
b = [4, 5, 6]
b = a
b.append(7)

It’s perfectly coherent that sublist is a non-rebindable name for a possibly-mutable value. In other words, a new variable each time through the loop.

In a language like C++, things are a little trickier, but it’s just as coherent that sublist is a non-l-modifiable-but-r-mutable l-value (unless sublist is of non-const reference type in which case it’s a modifiable l-value). Anyway, C++ has bigger problems: capturing a variable doesn’t do anything to keep that variable alive past its original scope, and there’s no way to capture by value without copying, so C++ has no choice but to force the developer to specify exactly what he’s trying to capture by reference and what he’s trying to copy.

Anyway, the one thing you definitely don’t expect is for sublist to be mutating the same list in-place each time through (what you get if you write sublist[:] = ... instead of sublist = ...). But you wouldn’t expect that whether sublist is being reused, or is a new variable.

So, ultimately, mutability isn’t an argument either way.

Performance

Obviously, creating and destroying a new variable each time through the loop isn’t free.

Most of the time, you’re not keeping a reference to the loop variable beyond the lifetime of the loop iteration. So, why should you pay the cost of creating and destroying that variable?

Well, if you think about it, in most languages, if you elide the creation and destruction, the result is invisible to the user unless the variable is captured. Almost all languages allow optimizers to elide work that has no visible effects. There would be nothing inconsistent about Python, or Java, deciding to reuse the variable where the effect is invisible, and only create a new variable when it’s captured.

So, what about the case where the variable is captured? If we want each closure to see a different value, our only choices are to capture a new variable each time, to copy the variable on capture, or to copy the value instead of capturing the variable. The first is no more costly than the second, and cheaper than the third. It’s the simplest and most obvious way to get the desired behavior.

So, performance isn’t an argument either way, either.

Simplicity

When a new user is learning the language, and writes this:

powers = []
for i in range(10):
    def func(x):
        return x**i
    powers.append(func)

… clearly, they’re expecting 10 separate functions, each raising x to a different power.

And, as mentioned earlier, even experienced developers in Python, Ruby, JavaScript, C#, etc. who have run into this problem before still write such code, and expect it to work as intended; the only difference is that they know how to spot and fix this code when debugging.

So, what’s the intuition here?

If the intuition is that lexical closures are early-bound, or by-value, then we’re in big trouble. They obviously aren’t, and, if they were, that would make closures useless. People use closures all the time in these languages, without struggling over whether they make sense.

If the intuition is that they’re defining a new function each time through the loop because they have a new i, that doesn’t point to any other problems or inconsistencies anywhere else.

And the only other alternative is that nobody actually understands what they’re doing with for-each loops, and we’re all only (usually) writing code that works because we treat them like magic. I don’t think that’s true at all; the logic of these loops is not that complicated (especially in Python).

So, I think this is an argument for #2.

Consistency with other languages

As I mentioned at the start, most languages that have both for-each loops and closures have this problem.

The only language I know of that’s solved it by adding capture by value is C++, and they already needed capture by value for a far more important reason (the dangling reference problem). Not to mention that, in C++, capture by value means something different than it would in an lvalue-less language like Python or JavaScript.

By contrast, C# 5.0 and Ruby 1.9 both changed from "reused-i" to "new-i" semantics, and Lua, Swift, and Scala have used "new-i" semantics from the start.[9]. C# and Ruby are particularly interesting, because that was a breaking backward-compatibility change, and they could very easily have instead offered new syntax (like for new i) instead of changing the semantics of the existing syntax. Eric Lippert’s blog covers the rationale for the decision in C#.

As mentioned earlier, Python’s simpler scoping roles (only functions are scopes, not every suite) do weaken the consistency argument. But I think it still falls on #2. (And, for a language with suite scopes, and/or a language where loop bodies are some weird not-quite-function things like Ruby, it’s definitely #2.)

Exact semantics

In languages where each suite is a scope, or where loop suites are already function-esque objects like Ruby’s blocks, the semantics are pretty simple. But what about in Python?

The first implementation suggestion for Python came from Greg Ewing. His idea was that whenever a loop binds a cellvar, the interpreter creates a new cell each time through the loop, replacing the local i binding each time. This obviously solves the loop-capture problem, with no performance effect on the more usual case where you aren’t capturing a loop variable.

This works, but, as Guido pointed out, it’s pretty confusing. Normally, a binding just means a name in a dict. The fact that locals and closures are implemented with indexes into arrays instead of dict lookups is an implementation detail of CPython, but Greg’s solution requires that implementation detail. How would you translate the design to a different implementation of Python that handled closures by just capturing the parent dict and using it for dict lookups?[10]

Nick Coghlan suggested that the simplest way to define the semantics is to spell out the translation to a while loop. So the current semantics for our familiar loop are:

_it = iter(range(10))
try:
    while True:
        i = next(it)
        powers.append(lambda x: x**i)
except StopIteration:
    pass

… and the new semantics are:

_it = iter(range(10))
try:
    while True:
        i = next(it)
        def _suite(i=i):
            powers.append(lambda x: x**i)
        _suite()
except StopIteration:
    pass

But I think it’s misleading that way. In a comprehension, the i is an argument to the _suite function, not a default parameter value, and the function is built outside the loop. If we reuse the same logic here, we get something a bit simpler to think through:

_it = iter(range(10))
def _suite(i):
    powers.append(lambda x: x**i)
try:
    while True:
        i = next(it)
        _suite(i)
except StopIteration:
    pass

And now the “no-capture” optimization isn’t automatic, but it’s a pretty simple optimization that could be easily implemented by a compiler (or a plugin optimizer like FAT Python). In CPython terms, if the i in _suite ends up as a cellvar (because it’s captured by something in the suite), you can’t simplify it; otherwise, you can just inline the _suite, and that gives you exactly the same code as in Python 3.5.

I still think there might be a better answer than the comprehension-like answer, but, if so, I haven’t thought of it. My suspicion is that it’s going to be like the comprehension problem: once you see a way to unify the two cases, everything becomes simpler.[11]

Conclusion

So, if you’re designing a new language whose variable semantics are like C#/Java/etc., or like Python/JavaScript/etc., I think you definitely want a for-each statement that declares a new variable each time through the loop–just like C#, Swift, and Ruby.

For an existing language, I think it’s worth looking at existing code to try to find anything that would be broken by making the change backward-incompatibly. If you find any, then consider some new for new syntax; otherwise, just do the same thing C# and Ruby did and change the semantics of for.

For the specific case of Python, I’m not sure. I don’t know if the no-backward-compatibility decision that made sense for C# and Ruby make sense for Python. I also think the new semantics need more thought–and, after that’s worked out, it will depend on how easily the semantics fit into the simple scoping rules, in a way which can be taught to novices and transplants from other languages and then held in their heads. (That really is an important feature of Python, worth preserving.) Also, unlike many other languages, the status quo in Python really isn’t that bad–the idiomatic default-value trick works, and doesn’t have the same verbosity, potential for errors, etc. as, say, JavaScript’s idiomatic anonymous wrapper function definition and call.


  1. In a typical for statement, there’s a statement or suite of statements run for each iteration. In a comprehension or other expression involving for, there may be subsequent for, if, and other clauses, and an output expression. Either way, the loop variable is in scope for all that stuff, but nowhere else.

  2. In Python, only functions and classes define new scopes. This means the loop variable of a for statement is available for the rest of the current function. Comprehensions compile to a function definition, followed by a call of that function on the outermost iterable, so technically the loop variable is available more widely than you’d expect, but that rarely comes up–you’d have to do something like write a nested comprehension where the second for clause uses the variable from the third for clause, but doesn’t use it the first time through.

  3. There are plenty of languages where normal function definitions and lambda definitions actually define different kinds of objects, and normal functions can’t make closures (C++, C#) or even aren’t first-class values (Ruby). In those languages, of course, you can’t reproduce the problem with lambda.

  4. The same trick is frequently used with globals and builtins, for two different purposes. First, len=len allows Python to lookup len as a local name, which is fast, instead of as a builtin, which is a little slower, which can make a difference if you’re using it a zillion times in a loop. Second, len=len allows you to “hook” the normal behavior, extending it to handle a type that forgot a __len__ method, or to add an optional parameter, or whatever, but to still access the original behavior inside the implementation of your hook. Some possible solutions to the “local capture” problem might also work for these uses, some might not, but I don’t think that’s actually relevant to whether they’re good solutions to the intended problem.

  5. In modern Python (3.0+), map is lazy–equivalent to a generator expression rather than to a list comprehension. But let’s ignore that detail for now. Just read it as list(map(...)) if you want to think through the behavior in Python 3.

  6. Well, not higher-order functions, because they can’t take functions, only blocks, in part because functions aren’t first-class values. But the Ruby approximation of HOFs. Ruby is just two-ordered instead of arbitrary-ordered like Lisp or Python.

  7. Ruby 1.9 made a breaking change that’s effectively my #2: the i block parameter is now a new variable each time, which shadows any local i in the block’s caller’s scope, instead of being a single variable in the caller’s scope that gets rebound repeatedly. There were some further changes for 2.0, but they aren’t relevant here.

  8. Ruby is an interesting exception here. The scope rules are pretty much like Python’s–but those rules don’t matter, because loop bodies are almost always written as blocks passed to looping methods rather than as suites within a looping statement or expression. You could argue that this makes the choice obvious for Ruby, in a way that makes it irrelevant to other languages–but it’s actually not obvious how block parameters should be scoped, as evidenced by the fact that things changed between 1.8 and 1.9, primarily to fix exactly this problem.

  9. Most of these also make i a constant. This avoids some potential confusion, at the cost of a restriction that really isn’t necessary. Swift’s design is full of such decisions: when the cost of the restriction is minimal (as it is here), they go with avoiding potential confusion.

  10. If you think it through, there is an answer here, but the point is that it’s far from trivial. And defining semantics in terms of CPython’s specific optimizations, and then requiring people to work back to the more general design, is not exactly a clean way to do things…

  11. Look at the Python 2.7 reference for list displays (including comprehensions), set/dict displays (including comprehensions), and generator expressions. They’re a mess. List comprehensions are given their full semantics. Set and dict comprehensions repeat most of the same text (with different formatting, and with a typo), and still fail to actually define how the key: value pairs in a dict comprehension get handled. Then generator expressions only hint at the semantics. The Python 3.3 reference tries to refactor things to make it simpler. The intention was actually to make it even simpler: [i**2 for i in range(10)] ought to be just an optimization for list(i**2 for i in range(10)), with identical semantics. But it wasn’t until someone tried to write it that way that everyone realized that, in fact, they’re not identical. (Raise a StopIteration from the result expression or a top-level if clause and see what you get.) I think there’s some kind of similar simplification possible here, and I’d like to actually work it through ahead of time, rather than 3 versions after the semantics are implemented and it’s too late to change anything. (Not that I think my suggestion in this post will, or even necessarily should, get implemented in Python anyway, but you know what I mean.)

4

View comments

Blog Archive
About Me
About Me
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.