There are a number of blogs out there that tackle the problems of callbacks for servers, or for Javascript, but novices trying to write Python GUIs shouldn't have to learn about the different issues involved in servers, or a whole different language.

In another post, I showed the two major approaches to writing asynchronous GUI code: threading and callbacks. Both have drawbacks. But there are a number of techniques to get many of the advantages of threads, on top of callbacks. So, to some extent, you can get (part of) the best of both worlds.

Most of these techniques come from the world of network servers. The central issue facing network servers is the same as GUIs--your application has to be written as a bunch of event handlers that can't block or take a long time. But the practical details can be pretty different.

Update #1: The original version of this made it sound as if async-style coroutines for GUIs would be the ideal solution to the problem. They wouldn't, they'd just be the best solution we could have in Python as the language is today. So I added a new section at the end.

Update #2: Callbacks as our Generations' Go To Statement by Miguel de Icaza is the best description I've seen so far of what's wrong with callback hell, and how to fix it. He's coming from a .NET (C#/F#) perspective, but (exaggerating only slightly) C# await/async is exactly how you'd design Python's coroutines and async module if you didn't already have generators, so it's worth reading.

Where we left off

For reference, here's a simple example in a synchronous version (which blocks the GUI unacceptably), a callback version (which is ugly and easy to get wrong), and a threaded version (which fails because it tries to access widgets from outside the main thread):

    def handle_click_sync():
        total = sum(range(1000000000))
        label.config(text=total)

    def handle_click_callback():
        total = 0
        i = 0
        def callback():
            nonlocal i, total
            total += sum(range(i*1000000, (i+1)*1000000))
            i += 1
            if i == 100:
                label.config(text=total)
            else:
                root.after_idle(callback)
        root.after_idle(callback)

    def handle_click_threads():
        def callback():
            total = sum(100000000)
            label.config(text=total)
        t = threading.Thread(target=callback)
        t.start()

Ideally, we want to get something that looks as nice as the threading example, but that actually works.

Promises

The simplest solution to avoid callback hell without using threads is to use promises, aka deferreds, objects that wrap up callback chains to make them easier to use.

While this idea came out of Python, where it's really taken off is Javascript. Partly this is because web browsers provide a callback-based API for both the GUI and I/O, and the language doesn't have any of the nice features Python has that make some of the later options possible.

The Twisted documentation explains the idea better than I could. If you can follow Javascript, there are also 69105 good tutorials you can find with a quick Google.

So, I'll just show an example (with a fictitious API, since none of the major GUI frameworks are built around promises):

    def handle_click():            
        d = Deferred()
        for i in range(100):
            def callback(subtotal):
                d = Deferred()
                d.resolve(subtotal + sum(range(i*1000000, (i+1)*1000000)))
                return d
            d.addCallback(lambda: root.after_idle(callback))
        d.addCallback(lambda: label.config(text=total))
        return d

While it's not quite as nice as the threaded version, it's similar--we don't have to turn the control flow inside-out, we don't have to worry about getting lost in the chain, exceptions will be propagated automatically and with useful tracebacks, and so on.

Also, notice how easy it would be to write a wrapper that maps a callback over an iterable, chaining the calls in series. Or various other ways of combining things. Again, if you can read Javascript, you can find some good examples of these kinds of functions online by searching, e.g, "deferred map".

You can also probably tell that Deferreds aren't that heavy of a wrapper around callbacks, so it's always pretty easy to understand what's happening under the covers. In practice, that isn't useful as often as it sounds--but it's not completely useless, either. But, more importantly, this means it's very easy to wrap a callback-based API in a promise-based API. Again, if you can read Javascript, you can find some "promisify" implementations and see how they work.

Yielding control

Some GUI frameworks have a way to yield control to the event loop, then resume in-place. A few platforms have native support for this kind of thing; on other platforms, they generally fake it on top of one of the other techniques below, but you can assume they did so in a way that makes sense and just as it.

Tkinter doesn't have such a mechanism, but I'll pretend that it had a function like wx's Yield. Then we could do this:

    def handle_click():
        total = 0
        for i in range(100):
            total += sum(range(i*1000000, (i+1)*1000000))
            root.yield_control(only_if_needed=True)
        label.config(text=total)

We did have to break the computation up into steps so we could yield every so often, but other than that, we didn't have to change the code at all. That's almost as simple as the threaded code, and without any of the drawbacks. However, yielding like this has some problems of its own.

Imagine how you'd write the sleep example from the previous post with this feature. There's no way to yield control for one second, at least not without an additional yield_sleep method. And likewise, there's no yield_read or yield_urlopen. So, this only really works for the simplest cases. While it wouldn't be impossible for some framework to include functions for all of these other cases, it would be infeasible.

Nested event loops

If you manually nest one event loop inside another, you can effectively break up your sequential code just by running a single step of the event loop, or a short-lived loop, every so often. Effectively, this gives you a way to write a yield_control function even if one doesn't exist.

However, this has the same problems as yield_control, plus some additional problems. If the user clicks the button (or triggers some other handler) while we're in the middle of processing the first press, we'll end up calling one event handler in the middle of another. This means your code has to all be reentrant.

On top of that, an impatient user, or just a flood of mouse-over events, could push you over the recursion limit.

At any rate, while this is doable with many GUI frameworks, including (with a bit of work) Tkinter, it's usually not the best solution.

Greenlets

Greenlets, aka fibers, are cooperative threads. They have an API similar to real threading, except that you have to explicitly tell them when to give control to another thread.

The code looks like a cross between threading code and yielding code. Using a fictitious greenlet API that lets the Tkinter root object also work as a greenlet controller:

    def handle_click():
        def callback():
            total = 0
            for i in range(100):
                root.switch()
                total += sum(range(i*1000000, (i+1)*1000000))
            label.config(text=total)
        t = root.new_greenlet(target=callback)
        root.switch(t)

One advantage of greenlets is that they allow you to compose low-level async functions into higher-level async functions by just spawning a greenlet that calls the low-level functions. So, once you've got a library like gevent for the low-level async I/O, it's almost trivial to wrap up even something as complicated as requests, as you can see in the source to the grequests wrapper. And then we can use it like this:

    def handle_click():
        def callback():
            r = grequests.get('http://example.com')
            soup = BeautifulSoup(r.text)
            label.config(text=soup.find('p').text)
        t = root.new_greenlet(target=callback)
        root.switch(t)

(In fact, gevent itself wraps up large chunks of the stdlib this way, including urlopen, so we didn't even really need this example, but it seemed worth showing anyway.)

The big problem is that greenlets have to be integrated with your event loop framework. And, while there are greenlet-based networking libraries like gevent, there is no greenlet-based GUI library.

If your GUI framework has a way to manually run one iteration of the event loop at a time, you can run the event loop itself inside a greenlet, and then drive it with any greenlet-based event loop, like gevent, or (if you don't need any async I/O) just a trivial scheduler loop. This can be pretty handy with a game framework like pygame. It's rarely used with GUI frameworks like Tkinter, but there's no reason it couldn't be.

Coroutines

You can build coroutines out of Python generators, which allow you to get most of the benefits of greenlets without actually needing greenlets. Instead of switching to another greenlet, you yield from another coroutine (or a future object that wraps up the result of a coroutine). Greg Ewing has a great presentation on this.

The code for a hypothetical coroutine-based GUI framework might look like this:

    def handle_click():
        total = 0
        for i in range(100):
            yield from sleep(0)
            total += sum(range(i*1000000, (i+1)*1000000))
        label.config(text=total)

But there is no such framework.

Also, integrating a coroutine-based GUI framework with an async I/O framework (except by using the hybrid approach of having separate threads for each) looks like a hard problem that nobody's even thought through, much less solved.

Coroutines over callbacks

Instead of building a coroutine scheduler, you can build a coroutine API on top of explicit callbacks. PEP 3156 will add this to the standard library in some future version, and there are already third-party frameworks like Twisted's inlineCallbacks, or Monocle.

The code looks the same as the above, except that you generally need some kind of decorator to mark your coroutines. So:

    @coroutine
    def handle_click():
        total = 0
        for i in range(100):
            yield from sleep(0)
            total += sum(range(i*1000000, (i+1)*1000000))
        label.config(text=total)

And, while integrating a GUI framework with an async I/O framework is still a problem, doing so at the callback level is a solved problem. Some post-PEP 3156 project may add it to the standard library, and Twisted already integrates with a variety of GUI frameworks, including Tkinter.

With a fictitious API to match the previous example:

    @coroutine
    def handle_click():
        r = yield from urlopen('http://example.com')
        data = yield from r.read()
        soup = BeautifulSoup(data)
        label.config(text=soup.find('p').text)

Notice that with greenlets, async operations didn't require any kind of marking; you only needed to call switch when you wanted to give up control without waiting on an async operation. With coroutines, you mark both cases with an explicit yield from. Is this extra noise in the coroutine code, or is it a case of "explicit is better than implicit"? That's a matter of opinion, or at least debate.

Implicit coroutines, implicit futures, dataflow variables, …

Async-module-style coroutines are pretty cool, but there's a problem. As soon as I decide to make some previously-synchronous function asynchronous (e.g., maybe I had a function that returns a quick string, but then I realized that sometimes it needs to load it off disk over a network share), every function that calls it has to now yield from it—and, of course, has to become a coroutine itself, meaning every function that called it has to adjust, and so on up the chain. It's certainly better than manually pushing callbacks up the chain and figuring out how to propagate errors and so forth, but… why can't the compiler or interpreter do this for us?

The interpreter can tell that I'm calling an async coroutine without yield from and complain at me. Could it just automatically do a yield from and turn my function into a coroutine (the same way adding any yield or yield from turns it into a generator) instead, and push that all the way up the chain? My top-level function would need to be changed manually the first time you use an async coroutine anywhere in the code, but that's it.

Unfortunately, that doesn't fit into the model that Python uses. But I don't see why you couldn't design a language that works that way.

But once you're doing that, there may be an easier solution. Python's futures are, while a higher-level abstraction than callbacks, still lower-level than you'd want. First, they're explicit—you can't just use a future string as a string, you have to ask it for its value and then use the result as a string. And there's a reason for that: asking for the value is a blocking operation. Which means you usually don't ask for the value; you attach a callback, and its runloop calls your callback. The key to the coroutine design is that it effectively lets you yield from a future's value instead of attaching a callback. So, what if there were no implicit yield-from-ing, and maybe even no explicit yield-from-ing, and instead, accessing a future's value automatically turned your function into a coroutine and blocked it until the value was ready? That would get you 90% of the benefit of implicit coroutines without the problems. And then, of course, you could make your futures implicit—a future string is just a string, but the first time you try to use it, you block until it's ready. (This does mean that some cases, like waiting with a timeout, need some other mechanism, but that's pretty easy—allow you to use any value as if it were a future; print('s: {}'.format(timeout(s, 5.0))) if s is a future string waits up to 5 seconds for s and raises if not ready; if s is a plain old string, it just returns immediately. This is effectively the way Alice ML and a few other research languages approach the async problem.

At first glance, it looks like greenlet libraries like gevent give you the same benefits, but that's misleading. They have to patch all of the low-level synchronous methods with more complicated explicit greenlet-blocking code. Often those low-level primitives can be composed without thinking about it, but, unlike futures, that's not universally true. And if you want to add some new functionality, you can't just return a future, you have to do the same work that gevent does to wrap up the new primitives.

And you can simplify things even more than implicit futures with dataflow variables. In Python, a variable is just a binding from a name for an object. But in Oz, this is split into two parts, which allows you to create structures that contain unbound variables and then bind those partially-bound structures to variables. I won't get into all of the differences this makes, because if you haven't read CTM and you're serious about programming, you should stop whatever you're doing and go read it, but there's one important benefit here: The system needs some strategy for dealing with access to unbound variables, and one such strategy is just to block until the variable is bound. This makes everything an implicit future, and it leaves it up to the language to schedule things properly (when combined with greenlets, this gives you the same kind of magic that gevent had to do explicitly, but for free), but also makes everything available to the program when it matters.

One last thing: Sometimes, to understand what's really happening, it helps to think in terms of continuations. A continuation is just the rest of the program after a given point. A few languages (notably Scheme and its descendants) let you explicitly ask for a continuation, pass it around, and continue it. This is primarily useful for building coroutines (which we can already do in Python without needing to think about continuations). When you take all of the code from your function after a blocking call, plus a callback that was passed in, and wrap it all up in a callback function that you then pass to that blocking call, what you're doing is recreating continuations explicitly and at a higher level. If you could just get the current continuation and pass it, you could then write the rest of your function synchronously, and then continue the passed-in continuation at the end, without needing to wrap anything up in a callback function. So far, that still sounds less user-friendly than yield from coroutines, because you have to explicitly accept and continue a continuation. But that part is exactly the part we're trying to automate away. So, imagine a language that allowed you to mark primitives with @blocking, which would give them a hidden extra parameter for a continuation, and turn every return into a continue. And then, calling a @blocking function would get the current continuation and pass it as that hidden parameter. And that's all you'd have to do. That's exactly what both implicit continuations and implicit futures are trying to accomplish, but they're attacking it at a higher level. That may be worth doing, because normally-implicit continuations and futures that you can access explicitly when you want to are useful programming abstractions in their own right. But attacking it at the lowest level may make the problem easier to solve.

At any rate, none of the ideas in this section would fit into Python (even if most of them weren't rambling messages that may fall off the thin line between too much detail and inaccuracy for the sake of simplicity…), but they're all worth keeping in your head when thinking about what you could do for Python, and how.
0

Add a comment

Blog Archive
About Me
About Me
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.