In a recent thread on python-ideas, Nick Coghlan said:
The reason I keep beating my head against this particular wall (cf. PEP 403's @in clauses and PEP 3150's given suites) is that my personal goal for Python is that it should be a tool that lets people express what they are thinking clearly and relatively concisely. As far as I have been able to tell, the persistent requests for "multi-line lambdas", cleaner lambda syntax, etc, are because Python doesn't currently make it easy to express a lot of operations that involve higher order manipulation of "one shot" callables - closures or custom functions where you *don't* want to re-use them, but Python still forces you to pull them out and name them. I think PEP 403 is my best current description of the mental speed bump involved: http://www.python.org/dev/peps/pep-0403/
But I think there are actually two different problems that people tend to lump together—in fact, they're nearly opposite. The first problem is that you sometimes want to build complicated functions in-line. One reason we have lambda is that it lets you define a function—something which is normally done as a block statement—in-line. But our lambda isn't as powerful as our def, so some functions just can't be defined in-line. The second problem is that you often don't even want to define a function, you just want to pass around an expression. Sure, lambda lets you wrap any expression in a function, but it's an extra abstraction that gets in the way.

(Update: After taking another look at D, I've realized there's another use case, which can be solved separately, which leads to solving a wider range of problems than it appears… but it doesn't work for Python. See the sections on lazy arguments and lazy parameters for details.)

They really are separate

Solving either problem does nothing at all to solve the other one. But, because they're both seen as "problems with lambda", they tend to get lumped together, and any solution that solves one problem is likely to get dismissed because it doesn't help the other. For example, when Nick Coghlan suggested a lighter-weight syntax for building simple 0- and 1-argument functions, most of the replies were complaints that it doesn't scale well to more or arbitrary arguments, or attempts to solve that problem. Maybe a better way to see this is that JavaScript completely solves the first problem by turning full-fledged function definitions into expressions instead of statements, but that does nothing at all to help the second one—in fact, it makes the syntax for "passing around expressions" even worse than in Python.

Dismissing the first problem

I'm personally not too worried about the first problem. Anyone who programs long enough in JavaScript soon comes to learn what "callback hell" means, and turns to using Deferred/Promise (borrowed from Python…) or similar solutions to contain it. Really, the only problem with being forced to define long functions as statements is that you have to write things "out of order", defining the function ahead of the more-interesting code that uses that function. I think that's a relatively minor problem, and PEP 403 is a step toward a solution of it. But I'm not too concerned about that problem.

Examples of the second problem

There are three paradigm cases where you want to pass around expressions: higher-order functions like map, filter, and takewhile, simple callback binding, and lazy evaluation.

Higher-order functions

If I want to square each value in a sequence with map, I have to write this:
    map(lambda x: x**2, seq)
The problem here isn't the performance hit of the function call, or the verbosity. The fact that the boilerplate obscures the interesting part of the code is part of the problem. But the real problem is that there's an extra abstraction here that you have to read through to understand what's happening. Nobody cares that there's a function call going on here, but there is one, and you have to reason around that. And that's exactly why we have comprehensions:
    [x**2 for x in seq]
Here, you're directly mapping the expression over the sequence. Unfortunately, you can't do that in general; only map and filter can be replaced this way. If you want to, say, sort a list based on an expression, too bad; you have to wrap it in a function.

Callback binding

The other place we want to pass expressions around is in simple callbacks. This really is a special case of the previous example, but that fact is less obvious. Most novices don't think of, say, a Button constructor with a callback parameter as a higher-order function, which is why they write this:
    b = Button("#10", callback=choose(10))
Of course that's perfectly legal code, but it's code that calls choose(10), then tries to bind its return value as the callback for a button. Obviously what they really wanted is this:
    b = Button("#10", callback=lambda: choose(10))
But really, they don't want to bind the button to a function, they just want to bind the button to a piece of code—an expression. Unfortunately that's not possible.

Lazy arguments

Finally, sometimes we just want to pass an expression around that may be expensive or dangerous to evaluate, and only evaluate it when needed (which may be never) or safe. Logging is a simple example. If I do this:
    logger.debug('response contents: {}'.format(r.contents))
This is going to read r.contents (which may force us to, e.g., wait for an entire web page to download) to build the string in order to build the string, even if we've disabled debug logging. You often don't want to do this. But the only way around is it to check that the log level is debug or beyond on every single call to logger.debug, which violates both DRY and encapsulation.

A lazy_debug method that takes an iterator or a function can solve the problem:
    def lazy_debug_i(message_iter):
        if logger.getEffectiveLevel() <= logging.DEBUG:
            logger.debug(next(message_iter))

    def lazy_debug_f(message_func):
        if logger.getEffectiveLevel() <= logging.DEBUG:
            logger.debug(message_func())
But as easy as those are to write, they're not much fun to call:
    lazy_debug_i('response contents: {}'.format(r.contents) 
                 for _ in range(1))

    lazy_debug_f(lambda: 'response contents: {}'.format(r.contents))

    lazy_debug_f(partial('response contents: {}'.format, r.contents))

Solutions?

Solving a bit at a time

Comprehensions are a partial solution to the problem. The reason people want a "while clause" or other syntactic version of takewhile is that they want to use an expression, not a function, to terminate iteration, the same way they can use an expression to filter it. (And the linked blog has a bit more explanation on why they want that.) To some extent, decorators, context managers, conditional expressions, the new except expression proposal, and in fact much of the new syntax added to or proposed for Python since the early 2.x days are special-case solutions to this problem. In most of these cases, it's not that we had a function that was unpleasant to use, like map and filter, it's that we didn't even have it at all, because it would so obviously be unpleasant to use. For example, a short-circuiting conditional expression could easily be a function call:
    def cond(condition, thenfunc, elsefunc):
        if condition:
            return thenfunc()
        else:
            return elsefunc()

    cond(meat, lambda: 2*spam(x), lambda: 3*eggs(y))
But nobody does that, because you have to wrap up the expressions in functions just to defer their evaluation. If you could do this (and still short-circuit evaluation):
    cond(meat, 2*spam(x), 3*eggs(y))
… then there may not have been as much pressure for this:
    2*spam(x) if meat else 3*eggs(y)
Sure, the if expression is a little nicer, but if that were all we gained, would it be worth changing Python's syntax? I doubt it. During the discussion on except expressions, a dozen times or so, someone brought up "Why can't you just use a function"? Nobody ever answered "I want to infix the word except" or "to save a few characters". The answer is "to not evaluate the fallback expression if there's no exception", with an implicit "… without having to wrap everything in a lambda". It's perfectly plausible that there just is no general solution to this problem, and the right thing to do is exactly what Python has been doing for the past 20 years: find the common pain points and add special syntax for them. That will never make Python a perfect language, but Python has never striven to be a perfect language, only a very good one. Which is why we're using it. On the other hand, there are some cases that are very common, and it's hard to imagine how they could be solved. For example, passing a key function to a sorting function. We're not going to turn list.sort, sorted, itertools.groupby, and dozens of other functions into some kind of special syntax, are we?

Lazy parameters

The D language has a partial solution that Python doesn't, which nicely solves the third case (the logging example): declare a function to take lazy parameters, and then the caller automatically passes a lambda that returns the argument, instead of the actual argument. So:
    def lazy_debug(lazy message):
        if logger.getEffectiveLevel() <= logging.DEBUG:
            logger.debug(message())

    lazy_debug('response contents: {}'.format(r.contents))
Pretty neat. In fact, it's doesn't just solve the lazy arguments problem, it solves a much wider class of problems. Any function that wants an expression returning foo can just take a lazy foo. For example, you could wrap up Tkinter like this (simplified):
    class LazyButton(Button):
        def __init__(self, name, lazy callback):
            super().__init__(name, callback)
And now, you can write the naive code that blows up, and it actually works as intended:
    b = LazyButton("#10", callback=choose(10))
This is impossible in Python. This relies on static typing: when the D compiler sees the call to lazy_debug (or LazyButton.__init__), it knows that lazy_debug has been declared as a function that takes a lazy string. (If it doesn't know what lazy_debug has been declared as, it just gives you an error and refuses to compile.) In Python, the compiler has no idea what lazy_debug is; it's just a name that will be looked up at runtime and called. All calls are compiled the same way: push the arguments on the stack, then call the callable. In other words, by the time the interpreter is looking at the callable's parameters or flags, all of the arguments have been evaluated and pushed.

But if you think about it, a function whose parameters are all lazy is basically just a Lisp macro, and we rarely need the flexibility of evaluating some arguments but not others, and macros are easier to implement, so… we'll come back to that later.

Making lambda prettier

Making lambdas prettier in simple cases doesn't actually solve the problem in any way; you still have the same abstraction. However, it can let your eyes skip past it. For example, if you're attaching 20 callbacks in a row to different widgets, and the form is consistent enough and simple enough, you only really have to read the abstraction once or twice; after that, you'll start reading through it, directly to the expression. So, Nick Coghlan's suggestion to revive and extend PEP 312 might be helpful here. In brief: A 0-argument function can be written without the lambda keyword anywhere it's not syntactically ambiguous (which means anywhere, if you use parens). A 1-argument lambda can be written the same way, with ? as a placeholder for the argument (ala sqlite3). If you have more arguments, you can't use this form. Some code will look ugly or misleading with the ? or without the lambda keyword (e.g., defining a lambda as a value in a dict display)—but just don't use this form in those cases and that's not a problem. Here are a few examples written this way:
    map(:?**2, seq)
    b = Button("#10", callback=:choose(10))
    cond(meat, :2*spam(x), :3*eggs(y))
    sorted(seq, key=:?['spam'])

Quoting

The "piece of code" idea highlights one possible solution. In Tcl, and early JS and PHP, the solution was simple: you literally pass around a piece of code, as a string, which gets eval'd in the appropriate context. This is obviously a terrible idea for all kinds of reasons, but most of those reasons go away when you lift it up one level to passing around a piece of code syntactically instead of textually. Lisp is the obvious paradigm here. In many cases, you just quote an expression, and it doesn't become a function, it becomes the syntax for that expression. The higher-order function then just evaluates that expression in the right context. The problem is that this just trades one abstraction—a function definition and call—for another—a quote and evaluation. I think Tcl and Lisp both show that this can sometimes be simpler to wrap your head around in some cases, but it's still not really what we're after.

Macros

Back to Lisp: If you don't want to force callers to quote their code, just define a macro. Then you get syntax rather than values for all of your arguments. And now, there is no abstraction for the user to think about. If I call a macro that maps an expression over a sequence, it just works. The first problem here is that macros are compile-time, not run-time. There are experimental languages with run-time macros, but I don't think anyone's really made it work cleanly (except by just not having a compile time at all…). This seems like something that Microsoft's DLR team would have solved if it were easy to tackle. And I don't know if turning Python into a "two-level" language, with flexible computation both at compile-time and at run-time, is a good idea. I'm sure it wouldn't turn out as bad as C++, but that's hardly a ringing endorsement.

Digression on ASTs

What if you want to do more than just "evaluate this expression in that context"? In Lisp, it's easy, because Lisp's syntax is trivial. The syntax of an expression is just a list of symbols and lists. In any language with real syntax, what you get is going to be an AST, which is a lot more painful to work with. Some ML-derived languages make it nicer with pattern-matching decomposition. And Dylan points the way to implementing a similar solution without ML-style pattern matching. But MacroPy shows how a powerful suite of helpers can make most cases readable with no language changes. And if MacroPy turned out to be the answer to a lot of important questions, I'll bet we'd get more bang for the buck looking into ways to improve the language that make MacroPy easier/better, rather than ways to do things that could already be done with macros.

Higher-order expressions

Haskell, despite spelling lambda with a single character, has a different solution for avoiding it: provide a whole slew of higher-order functions that let you create the function you want out of existing pieces, rather than defining it from scratch. In fact, we already have a limited example in Python today:
    b = Button("#10", callback=partial(choose, 10))
But Haskell goes way beyond that. All functions are curried (meaning they "auto-partial"), and operators can be "sectioned" (another way to partial implicitly), and there are functions, operators, and/or syntax for composing, applying, flipping arguments, turning infix operators into prefix functions and vice-versa, and so on. So, in many cases, it's possible to write a higher-order expression with the right combination of partials, flips, compositions, lifts, etc., that evaluates to a function that computes your first-order expression, and that also looks a lot like that first-order expression. For example, take this lambda, written in Python and Haskell:
    lambda x: x ** 2
    \x -> x ** 2
By converting the operator to a prefix function, flipping its arguments, and partially-applying it, you get this (with some cheating in Python):
    partial(flip(operator.get('**')), 2)
    (** 2)
While it got a lot harder to read in Python, it got a lot simpler in Haskell. Unfortunately, this is not always easy, or even possible. Taking another example:
    lambda p: p ** 2 < n
    \p -> p ** 2 < n

    compose(partial(flip(operator.get('<')), n), 
            partial(flip(operator.get('**')), 2))
    (< n) . (** 2)
Sure, it's a lot better than the Python version, and it's pretty easy to understand for anyone who knows Haskell—but is it really readable as computing the expression p ** 2 < n? I don't think so. There are ways you can get things in the right order, and remove the dot, in each case by adding another abstraction on top of things. But ultimately, all you're doing is creating code whose meaning is easier to guess but harder to actually work through. This kind of code does become almost second-nature to Haskell programmers, but I suspect that's only because anyone who isn't prone to thinking in higher-order terms never becomes a Haskell programmer.

Automatic lambdas

Imagine if you could replace this lambda expression with this simpler expression, with the same meaning:
    lambda p: p ** 2 < n
    _1 ** 2 < n
How could that work? It's not that hard. _1 is a function that, when called, returns its first argument. And it overloads the ** operator, so _1 ** 2 is a function that, when called, returns its first argument squared. And it overloads the < operator, so _1 ** 2 < n is a function that, when called, returns the comparison of its first argument squared to n. Boost.Lambda builds lambda functions this way in C++. And you can do the same thing in Python. I'm not aware of any real implementation, so I built an incomplete toy version. But there are two big problems. First, in C++, and even more so in Python, there are are some operators that can't be overloaded this way—or that can be overloaded from only one side, or that can be overloaded but have to return a specific type. For example, you can overload _1[idx], but not seq[_1]. You can solve that by adding wrappers like val that take a value and return a function that returns that value, so you can write val(seq)[_1]. It can get a bit ugly, but there are many cases where it's still more readable than an explicit lambda. And again, this isn't supposed to replace all cases of lambda anyway, just the ones it happens to make more readable. Second, one thing you want to use in expressions all the time is a function call. But you have to overload __call__ to do that—and if you do, there's no way to call the resulting expression! If foo is an implicit lambda function that does something, and foo(_1) is an implicit lambda function that calls whatever foo evaluates to on its first argument, then foo(_1)() is an implicit lambda function that calls whatever foo evaluates to on its first argument and then calls the result with no arguments, right? Boost.Lambda gets away with this thanks to static typing: Casting from an auto-lambda to a function gives you something that can actually be called, and assigning to a function variable, passing to a parameter declared as a function, etc. implicitly does that cast. That option is obviously not available in Python. I tried coming up with some DWIM heuristics, but really, I don't think that's doable. You need to write the "cast" explicitly, at which point… you might as well just spell it lambda _1: and be done with it, right? (Plus, then you can use a nicer name than _1, because it doesn't have to be global…) Third, it's easy for me to say "so _1 ** 2 < n returns a function that, when called, returns the comparison of its first argument squared to n". But how do you implement that? Either you have to chain together a bunch of lambda definitions and calls, which will get pretty slow to execute and generate unreadable tracebacks, or you need to write some code that keeps track of the expression and flattens everything into a single function, which is pretty complicated. In fact, one of the reasons C++11 has syntax for lambdas is that Boost.Lambda can make compilation incredibly slow. Sure, part of the reason for that is the attempt to write the whole thing in the limited C++ compile-time template language instead of in a real usable language like C++ itself or, better, Python… but it really is not an trivial task. A language designed around this solution from the start could be very cool. Trying to wedge it into Python after the fact doesn't seem likely to work.

Blocks

Every Ruby fan who sees this will think they know the answer. In Ruby, you don't need to create a lambda to use as a callback, just use a block: seq.map{|x| x**2}. So, what does that mean? Well, instead of creating a function via a lambda expression, you create a proc via a block expression. It's about the same amount of syntax (and more than, say, a Haskell or C++ lambda). And, while a proc is a different abstraction from a function, it's the same kind of abstraction. In fact, really, all you're doing here is making people learn two different but similar abstractions. That's not a win at all. Not that blocks don't solve some problems, they just don't solve this one.

Lazy arguments

There's another case 

The D language is intended to keep everything right about C++ and fix everything wrong with it. D's functions aren't first-class runtime values, but they can be implicitly wrapped in delegate objects, which are. And it has a pretty compact lambda syntax. So, to call a sorting function or assign a callback to a widget, you can do basically the same thing as Python.

But there's one 

    sort(seq, compare={)


Like C++ functions, D functions aren't first-class values at runtime, but they can be wrapped up in something that is (std::function in C++, delegate in D). And, like C++, 

Conclusion

I doubt there is a complete solution to the problem. But I think a combination of things will help to make Python good enough:
  • Keep finding specific places that can be improved with new syntax.
  • Possibly add the PEP-312-style keyword-less lambda.
  • Get more people playing with MacroPy, at least its quasi-quoting.
But that doesn't mean people shouldn't keep looking for more general solutions.
2

View comments

Blog Archive
About Me
About Me
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.