Many languages have a for-each
loop. In some, like Python, it’s
the only kind of for
loop:
for i in range(10):
print(i)
In most languages, the loop variable is only in scope within the code
controlled by the for
loop, except in languages that don’t have
granular scopes at all, like Python.
So, is that i
a variable that gets updated each time through the
loop or is it a new constant that gets defined each time through the
loop?
Almost every language treats it as a reused variable. Swift, and C#
since version 5.0, treat it as separate constants. I think Swift is
right here (and C# was right to change, despite the
backward-compatibility pain), and everyone else is wrong.
However, if there is a major language that should be using the
traditional behavior, it’s Python.
Loops and closures
In any language that has lexical closures that capture variables,
you’ll run into the foreach-lambda problem. This has been known in
Python (and JavaScript, and Ruby, etc.) for decades, and is even
covered in Python’s official Programming FAQ, but every new
programmer discovers it on his own, and every new language seems to do
so as well, and it always takes people by surprise.
Consider this:
powers = [lambda x: x**i for i in range(10)]
This creates 10 functions that return x**0
, x**1
, … x**9
, right?
If for-each
works by re-using a variable, as it does in Python (and
JavaScript, …) then no, this creates 10 functions that all return
x**9
(or, in some languages, x**10
). Each function returns x**i
,
where i
is the variable from the scope the function was defined
in. At the end of that scope, i
has the value 9
.
The problem has nothing to do with lambdas or comprehensions. You’ll
get the same behavior this way:
powers = []
for i in range(10):
def func(x):
return x**i
powers.append(func)
Whenever people run into this problem, in any language, they insist
that the language “does closures wrong”–but in fact, the language is
doing closures perfectly.
So, does that mean all those users are wrong?
Traditional solution
In most languages, you can solve the problem by wrapping the function
definition inside another function that you define and call:
for i in range(10):
def make_func(j):
def func(x):
return x**j
return func
powers.append(make_func(i))
This works because the j
parameter gets a reference to the value of
the i
argument, not a reference to the i
variable itself. Then,
func
gets a reference to that j
variable, which is different each
time.
The problem with this solution is that it’s verbose, and somewhat
opaque to the reader. It’s a bit less verbose when you can use
lambdas, but arguably it’s even less readable that way:
for i in range(10):
powers.append((lambda j: (lambda x: x**j))(i))
In some languages, including Python, there’s a simpler way to do this:
for i in range(10):
def func(x, j=i):
return x**j
powers.append(func)
That’s because the default parameter value j
gets a reference to the
value of i
in the defining scope, not a closure cell capturing the
variable i
.
Of course this is a bit “magical”, but people quickly learn to
recognize it in Python. In fact, most experienced Python developers
will spell it i=i
instead of j=i
, because (once you recognize the
idiom) that makes it clear why we’re using a parameter with a default
variable here: to get the value of i
at the time of definition (or,
in some cases, as an optimization–e.g., you can use len=len
to turn
len
into a local variable instead of a builtin within the function,
which makes it a bit faster to look up each time you call it).
The bigger problem is that it makes j
a parameter, which means you
can override the default value with an argument. For example,
powers[3](2, 10)
gives you 1024
, which is almost certainly not
something anyone wanted. You can do tricks like making it
keyword-only, prefixing it with an underscore, etc. to try to
discourage people from accidentally passing an argument, but it’s
still a flaw.
Terminology sidebar
In Python, this issue is often described in terms of late vs. early
binding. There are actually three places things could get bound:
compile time (like constants), function definition time (like default
parameter values), or function call time (like free variables captured by a closure). In
many discussions of late vs. early binding, the distinction is between
run time and compile time, but in this case, it’s between the later
and earlier run time cases: free variables are bound late, at call
time, while default parameter values are bound early, at definition
time. So, capturing a nonlocal is late binding; the “default-value
trick” of passing the value as the default value of an extra parameter
means using early binding instead.
A different way of solving the same problem is to consider capture by
variable vs. capture by value. In C++, variables are lvalues, but
lvalues are also things you can pass around and take references
to. You can write a function that takes an int
, or one that takes a
reference to an int
, spelled int&
(and another that takes a reference to a const int
, and one that takes an rvalue-reference to an int
, with
complicated rules about how you collapse an lvalue reference to an
rvalue reference and how overload distinguishes between an int
and a
reference to const int
when the lvalue is constant, and…). And
closures are built on top of this: there’s no special “closure cell”
or “closure environment”; closures just have (lvalue) references to
variables from the outer scope. Of course taking a reference to
something doesn’t keep it alive, because that would be too easy, so
returning a function that closes over a local variable means you’ve
created a dangling reference, just like returning a pointer to a local
variable. So, C++ also added capture by value: instead of copying an
lvalue reference (which gives you a new reference to the same lvalue),
you can copy an rvalue reference (which steals the reference if it’s a
temporary that would have otherwise gone away, or copies the value
into a new lvalue if not). As it turns out, this gives you another way
to solve the loop problem: if you capture the loop variable by value,
instead of by variable, then of course each closure is capturing a
different value. Confused? You won’t be, after this week’s episode of
C++.
Anyway, you can now forget about C++. In simpler languages like Java,
C#, JavaScript, and Python, people have proposed adding a way to
declare capture by value for closures, to solve the loop problem. For
example, borrowing C++ syntax (sort of):
for i in range(10):
def func[i](x):
return x**i
powers.append(func)
Or, alternatively, one of these proposals:
for i in range(10):
def func(x; i):
return x**i
powers.append(func)
for i in range(10):
def func(x) sharedlocal(i):
return x**i
powers.append(func)
for i in range(10):
def func(x):
sharedlocal i
return x**i
powers.append(func)
No matter how you spell it, the idea is that you’re telling func
to
capture the current value of the i
local from the enclosing scope, rather
than capturing the actual variable i
.
If you think about it, that means that capture by value in a language
like Python is exactly equivalent to early binding (in the “binding at
definition time” sense). And all of these solutions do the exact same
thing as the parameter default-value trick i=i
, except that i
is
no longer visible (and overridable) as a parameter, so it’s no longer
really a “trick”. (In fact, we could even hijack the existing
machinery and store the value in the function
object’s
__defaults__
, if we just add a new member to the code
object to
keep a count of captured local values that come after all the
parameters.)
While we’re talking terminology, when discussing how parameters get
passed in the previous section, is that “pass by value” or “pass by
reference”? If you think that’s a good question, or you think the
problem could be solved by just coming up with a new name, see
this post.
Solutions
Since the traditional solution works, and people obviously can learn
it and get used to it, one solution is to just do nothing.
But if we are going to do something, there are two obvious solutions:
- Provide a way for closures to use early binding or capture by
value. (Again, these options are equivalent in languages like
Python.)
- Provide a way for
for-each
loops to define a new variable each
time through the loop, instead of reusing the same variable.
Either one would solve the problem of closures capturing loop
variables. Either one might also offer other benefits, but it’s hard
to see what they might be. Consider that, in decades of people using
the default-value trick in Python, it’s rarely used (with locals)
for anything but loop variables. And similarly, you can find FAQ
sections and blog posts and StackOverflow questions in every language
discussing the loop problem, and none of them mention any other cases.
The first one obviously has to be optional: you still want closures to
capture some (in fact, most) variables, or they really aren’t
closures. And there’s no obvious way to make an automatic distinction,
so it has to be something the user spells explicitly. (As shown in the
previous section, there are plenty of alternative ways to spell it,
each with room for more bikeshedding.)
The second one could be optional–but does it have to be? The C#
team carefully considered adding a new form like this (in Python
terms):
for new i in range(10):
def func(x):
sharedlocal i
return x**i
powers.append(func)
But they decided that any existing code that actually depends on the
difference between for i
and for new i
is more likely to be buggy
than to be intentionally relying on the old semantics, so, even from a
backward-compatibility point of view, it’s still better to change
things. Of course that had to be balanced with the fact that some
people write code that’s used in both C# 4 and C# 5, and having it do
the right thing in one version and the wrong in the other is pretty
ugly… but even so, they decided to make the breaking change.
I’m going to examine the arguments for each of the two major
alternatives, without considering any backward compatibility issues.
Consistency with C-style for
This one isn’t relevant to languages like Python, which don’t have
C-style for
loops, but many languages do, and they almost always use
the same keyword, or closely related ones, to introduce C-style for
loops and for-each
loops. So, superficially, it seems like they
should be as similar as possible, all else being equal.
In a C-style for
loop, the loop variable is clearly a variable
whose lifetime lasts for the entire loop, not just a single
iteration. That’s obvious from the way you define the loop:
for (int i=0; i != 10; i++)
That third part isn’t generating a value for a new constant (that also
happens to be named i
), it’s mutating or rebinding a variable named
i
. That’s what the ++
operator means. If you change to i + 1
,
you get an infinite loop, where i
stays 0
forever.
And this is why it’s usually legal to modify the loop variable inside
the loop. It allows you to do things like this:
for (int i=0; i != spam.len(); ++i) {
if (spam[i].has_arg) {
process_with_arg(spam[i], spam[i+1]);
++i; // skips the next spam; we already used it
} else {
process_no_arg(spam[i]);
}
}
So, shouldn’t for-each loops be consistent with that?
I don’t think so. Most languages already consider it legal and
reasonable and only slightly unusual to modify the loop variable of a
C-style for
, but not to modify the loop variable of a
for-each
. Why is that? I think it’s because the two loops are
semantically different. They’re not the same thing, just because they
share a keyword.
So, at first glance this seems like an argument for #1, but in fact,
it’s not an argument either way. (And for a new language, or an
existing language that doesn’t already have C-style for
loops,
really, you don’t want C-style for
loops…)
Consistency with functions
Anyone who’s done any functional programming has noticed that
statement suites in imperative languages are a lot like function
bodies. In fact, most control statements can be converted to a
function definition or two, and a call to a higher-order function:
if spam:
do_stuff()
do_more_stuff()
else:
do_different_stuff()
def if_body():
do_stuff()
do_more_stuff()
def else_body():
do_different_stuff()
if_func(spam, if_body, else_body)
while eggs:
eat(cheese)
def while_cond():
return eggs
def while_body():
eat(cheese)
while_func(while_cond, while_body)
But that doesn’t work with for-each
:
powers = []
for i in range(10):
def func(x):
return x**i
powers.append(func)
powers = []
def for_body(i):
def func(x):
return x**i
powers.append(func)
for_each(range(10), for_body)
And this makes the problem obvious: that i
is an argument to the
for_body
. Calling a function 10 times doesn’t reuse a single
variable for its parameter; each time, the parameter is separate
thing.
This is even more obvious with comprehensions, where we already have
the equivalent function:
powers = [lambda x: x**i for i in range(10)]
powers = map(lambda i: (lambda x: x**i), range(10))
Also consider Ruby, where passing blocks to higher-order-functions
is the idiomatic way to loop, and for-each
loops are usually
described in terms of equivalence to that idiom, and yet, the
following Ruby 1.8 code produces 10 lambdas that all raise to the
same power:
powers = []
10.times do |i|
powers.push(lambda |x| { x**i })
end
So, this is definitely an argument for #2. Except…
Python
This is where Python differs from the rest of the pack. In most other
languages, each suite is a scope. This is especially obvious in
languages like C++, D, and Swift, which use RAII, scope-guard
statements, etc. where Python would use a with
statement (or a
try
/finally
). In such languages, the duality between scopes and
functions is much clearer. In particular, if i
is already going to
go out of scope at the end of the for
loop, it makes sense for each
individual i
to go out of scope at the end of an iteratiion.
But in Python, not only does i
not go out of scope at the end of the
for
loop, there are comfortable idioms involving using the loop
variable after its scope. (All such cases could be written by
putting the use of the loop variable into the else
clause of a
for
/else
statement, but many novices find for
/else
confusing,
and it’s not idiomatic to use it when not necessary.)
In addition, one of the strengths of Python is that it has simple
rules, with few exceptions, wherever possible, which makes it easier
to work through the semantics of any code you read. Currently, the
entire semantics of variable scope and lifetime can be expressed in a
few simple rules, which are easy to hold in your head:
- Only functions and classes have scopes.
** Comprehensions are a hidden function definition and call.
- Lookup goes local, enclosing, global, builtin.
- Assignment creates a local variable.
** … unless there’s an explicit
global
or nonlocal
statement.
- Exceptions are unbound after an
except
clause.
Adding another rule to this list would mean one more thing to
learn. Of course it would also mean not having to learn about the
closure-over-loop-variable problem. Of course that problem is a
natural consequence of the simple rules of Python, but, nevertheless,
everyone runs into it at some point, has to think it through, and then
has to remember it. Even after knowing about it, it’s still very easy
to screw up and accidentally capture the same variable in a list of
functions. (And it’s especially easy to do so when coming back to
Python from Swift or C#, which don’t have that problem…)
Mutable values
Let’s forget about closures now, and consider what happens when we
iterate over mutable values:
x = [[], [], []]
for i, sublist in enumerate(x):
sublist.append(i)
If you’re not familiar with Python, try it this way:
x = [[], [], []]
i = 0
for sublist in x:
sublist.append(i)
i += 1
What would we expect here? Clearly, if this is legal, the result
should be:
x = [[0], [1], [2]]
And there’s no obvious reason it shouldn’t be legal.
So, each sublist
isn’t really a constant, but a variable, right?
After all, it’s mutable.
Well, yes, each sublist
is mutable. But that’s irrelevant to the
question of whether the name sublist
is rebindable. In languages
where variables hold references to values, or are just names for
values, like Java (except for native types) or Python, there’s a very
obvious distinction here:
a = [1, 2, 3]
b = [4, 5, 6]
b = a
b.append(7)
It’s perfectly coherent that sublist
is a non-rebindable name for a
possibly-mutable value. In other words, a new variable each time
through the loop.
In a language like C++, things are a little trickier, but it’s just as
coherent that sublist
is a non-l-modifiable-but-r-mutable l-value
(unless sublist
is of non-const reference type in which case it’s a
modifiable l-value). Anyway, C++ has bigger problems: capturing a
variable doesn’t do anything to keep that variable alive past its
original scope, and there’s no way to capture by value without
copying, so C++ has no choice but to force the developer to specify
exactly what he’s trying to capture by reference and what he’s trying
to copy.
Anyway, the one thing you definitely don’t expect is for sublist
to be mutating the same list in-place each time through (what you get
if you write sublist[:] = ...
instead of sublist = ...
). But you
wouldn’t expect that whether sublist
is being reused, or is a new
variable.
So, ultimately, mutability isn’t an argument either way.
Performance
Obviously, creating and destroying a new variable each time through
the loop isn’t free.
Most of the time, you’re not keeping a reference to the loop variable
beyond the lifetime of the loop iteration. So, why should you pay the
cost of creating and destroying that variable?
Well, if you think about it, in most languages, if you elide the
creation and destruction, the result is invisible to the user unless
the variable is captured. Almost all languages allow optimizers to
elide work that has no visible effects. There would be nothing
inconsistent about Python, or Java, deciding to reuse the variable
where the effect is invisible, and only create a new variable when
it’s captured.
So, what about the case where the variable is captured? If we want
each closure to see a different value, our only choices are to capture
a new variable each time, to copy the variable on capture, or to copy
the value instead of capturing the variable. The first is no more
costly than the second, and cheaper than the third. It’s the simplest
and most obvious way to get the desired behavior.
So, performance isn’t an argument either way, either.
Simplicity
When a new user is learning the language, and writes this:
powers = []
for i in range(10):
def func(x):
return x**i
powers.append(func)
… clearly, they’re expecting 10 separate functions, each raising x
to a different power.
And, as mentioned earlier, even experienced developers in Python,
Ruby, JavaScript, C#, etc. who have run into this problem before still
write such code, and expect it to work as intended; the only
difference is that they know how to spot and fix this code when
debugging.
So, what’s the intuition here?
If the intuition is that lexical closures are early-bound, or
by-value, then we’re in big trouble. They obviously aren’t, and, if
they were, that would make closures useless. People use closures all
the time in these languages, without struggling over whether they make
sense.
If the intuition is that they’re defining a new function each time
through the loop because they have a new i
, that doesn’t point to
any other problems or inconsistencies anywhere else.
And the only other alternative is that nobody actually understands
what they’re doing with for-each
loops, and we’re all only (usually)
writing code that works because we treat them like magic. I don’t
think that’s true at all; the logic of these loops is not that
complicated (especially in Python).
So, I think this is an argument for #2.
Consistency with other languages
As I mentioned at the start, most languages that have both for-each
loops and closures have this problem.
The only language I know of that’s solved it by adding capture by
value is C++, and they already needed capture by value for a far more
important reason (the dangling reference problem). Not to mention
that, in C++, capture by value means something different than it would
in an lvalue-less language like Python or JavaScript.
By contrast, C# 5.0 and Ruby 1.9 both changed from "reused-i
" to
"new-i
" semantics, and Lua, Swift, and Scala have used "new-i
" semantics from the
start.. C# and Ruby are particularly interesting, because that was
a breaking backward-compatibility change, and they could very easily
have instead offered new syntax (like for new i
) instead of changing
the semantics of the existing syntax. Eric Lippert’s blog covers
the rationale for the decision in C#.
As mentioned earlier, Python’s simpler scoping roles (only functions
are scopes, not every suite) do weaken the consistency argument. But I
think it still falls on #2. (And, for a language with suite scopes,
and/or a language where loop bodies are some weird not-quite-function
things like Ruby, it’s definitely #2.)
Exact semantics
In languages where each suite is a scope, or where loop suites are
already function-esque objects like Ruby’s blocks, the semantics are
pretty simple. But what about in Python?
The first implementation suggestion for Python came from Greg
Ewing. His idea was that whenever a loop binds a cellvar, the
interpreter creates a new cell each time through the loop, replacing
the local i
binding each time. This obviously solves the
loop-capture problem, with no performance effect on the more usual
case where you aren’t capturing a loop variable.
This works, but, as Guido pointed out, it’s pretty
confusing. Normally, a binding just means a name in a dict. The fact
that locals and closures are implemented with indexes into arrays
instead of dict lookups is an implementation detail of CPython, but
Greg’s solution requires that implementation detail. How would you
translate the design to a different implementation of Python that
handled closures by just capturing the parent dict and using it for
dict lookups?
Nick Coghlan suggested that the simplest way to define the semantics
is to spell out the translation to a while
loop. So the current
semantics for our familiar loop are:
_it = iter(range(10))
try:
while True:
i = next(it)
powers.append(lambda x: x**i)
except StopIteration:
pass
… and the new semantics are:
_it = iter(range(10))
try:
while True:
i = next(it)
def _suite(i=i):
powers.append(lambda x: x**i)
_suite()
except StopIteration:
pass
But I think it’s misleading that way. In a comprehension, the i
is
an argument to the _suite
function, not a default parameter value,
and the function is built outside the loop. If we reuse the same logic
here, we get something a bit simpler to think through:
_it = iter(range(10))
def _suite(i):
powers.append(lambda x: x**i)
try:
while True:
i = next(it)
_suite(i)
except StopIteration:
pass
And now the “no-capture” optimization isn’t automatic, but it’s a
pretty simple optimization that could be easily implemented by a
compiler (or a plugin optimizer like FAT Python). In CPython terms, if
the i
in _suite
ends up as a cellvar (because it’s captured by
something in the suite), you can’t simplify it; otherwise, you can
just inline the _suite
, and that gives you exactly the same code as
in Python 3.5.
I still think there might be a better answer than the
comprehension-like answer, but, if so, I haven’t thought of it. My
suspicion is that it’s going to be like the comprehension problem:
once you see a way to unify the two cases, everything becomes
simpler.
Conclusion
So, if you’re designing a new language whose variable semantics are
like C#/Java/etc., or like Python/JavaScript/etc., I think you
definitely want a for-each
statement that declares a new variable
each time through the loop–just like C#, Swift, and Ruby.
For an existing language, I think it’s worth looking at existing code
to try to find anything that would be broken by making the change
backward-incompatibly. If you find any, then consider some new for new
syntax; otherwise, just do the same thing C# and Ruby did and
change the semantics of for
.
For the specific case of Python, I’m not sure. I don’t know if the
no-backward-compatibility decision that made sense for C# and Ruby
make sense for Python. I also think the new semantics need more
thought–and, after that’s worked out, it will depend on how easily
the semantics fit into the simple scoping rules, in a way which can be
taught to novices and transplants from other languages and then held
in their heads. (That really is an important feature of Python, worth
preserving.) Also, unlike many other languages, the status quo in
Python really isn’t that bad–the idiomatic default-value trick works,
and doesn’t have the same verbosity, potential for errors, etc. as,
say, JavaScript’s idiomatic anonymous wrapper function definition and
call.
View comments