Imagine a simple Tkinter app. (Everything is pretty much the same for most other GUI frameworks, and many frameworks for games and network servers, and even things like SAX parsers, but most novices first run into this with GUI apps, and Tkinter is easy to explore because it comes with Python.)

    def handle_click():
        print('Clicked!')
    root = Tk()
    Button(root, text='Click me', command=handle_click).pack()
    root.mainloop()

Now imagine that, instead of just printing a message, you want it to pop up a window, wait 5 seconds, then close the window. You might try to write this:

    def handle_click():
        win = Toplevel(root, title='Hi!')
        win.transient()
        Label(win, text='Please wait...').pack()
        for i in range(5, 0, -1):
            print(i)
            time.sleep(1)
        win.destroy()

But when you click the button, the window doesn't show up. And the main window freezes up and beachballs for 5 seconds.

This is because your event handler hasn't returned, so the main loop can't process any events. It needs to process events to display a new window, respond to messages from the OS, etc., and you're not letting it.

There are two basic ways around this problem: callbacks, or threads. There are advantages and disadvantages of both. And then there are various ways of building thread-like functionality on top of callbacks, which let you get (part of) the best of both worlds, but I'll get to those in another post.

Callbacks

Your event handler has to return in a fraction of a second. But what if you still have code to run? You have to reorganize your code: Do some setup, then schedule the rest of the code to run later. And that "rest of the code" is also an event handler, so it also has to return in a fraction of a second, which means often it will have to do a bit of work and again schedule the rest to run later.

Depending on what you're trying to do, you may want to run on a timer, or whenever the event loop is idle, or every time through the event loop no matter what. In this case, we want to run once/second. In Tkinter, you do this with the after method:

    def handle_click():
        win = Toplevel(root, title='Hi!')
        win.transient()
        Label(win, text='Please wait...').pack()
        i = 5
        def callback():
            nonlocal i, win
            print(i)
            i -= 1
            if not i:
                win.destroy()
            else:
                root.after(1000, callback)
        root.after(1000, callback)

For a different example, imagine we just have some processing that takes a few seconds because it has so much work to do. We'll do something stupid and simple:

    def handle_click():
        total = sum(range(1000000000))
        label.config(text=total)

    root = Tk()
    Button(root, text='Add it up', command=handle_click).pack()
    label = Label(root)
    label.pack()
    root.mainloop()

When you click the button, the whole app will freeze up for a few seconds as Python calculates that sum. So, what we want to do is break it up into chunks:

    def handle_click():
        total = 0
        i = 0
        def callback():
            nonlocal i, total
            total += sum(range(i*1000000, (i+1)*1000000))
            i += 1
            if i == 100:
                label.config(text=total)
            else:
                root.after_idle(callback)
        root.after_idle(callback)

Callback Hell

While callbacks definitely work, there are a lot of probems with them.

First, we've turned out control flow inside-out. Compare the simple for loop to the chain of callbacks that replaced it. And it gets much worse when you have more complicated code.

On top of that, it's very easy to get lost in a callback chain. If you forget to return from a sequential function, you'll just fall off the end of the function and return None. If you forget to schedule the next callback, the operation never finishes.

It's also hard to propagate results through a chain of callbacks, and even harder to propagate errors. Imagine that one callback needs to schedule a second which needs to schedule a third and so on--but if there's an exception anywhere on the chain, you want to jump all the way to the last callback. Think about how you'd write that (or just look at half the Javascript apps on the internet, which you can View Source for), and how easy it would be to get it wrong.

And debugging callback-based code is also no fun, because the stack traceback doesn't show you the function that scheduled you to run later, it only shows you the event loop.

There are solutions to these problems, which I'll cover in another post. But it's worth writing an app or two around explicit callbacks, and dealing with all the problems, so you can understand what's really involved in event-loop programming.

Blocking operations

Sleeping isn't the only thing that blocks. Imagine that you wanted to read a large file off the disk, or request a URL over the internet. How would you do that with callbacks?

We had to replace our sleep with a call to after, passing it the rest of our function as a callback. Similarly, we have to replace our read or urlopen with a call to some function that kicks off the work and then calls our callback when it's done. But most GUI frameworks don't have such functions. And you don't want to try to build something like that yourself.

I/O isn't the only kind of blocking, but it's by far the most common. And there's a nice solution to blocking I/O: asynchronous I/O, using a networking framework. Whether this is as simple as a loop around select or as fancy as Twisted, the basic idea is the same as with a GUI: it's an event loop that you add handlers to.

And there's the problem: your GUI loop and your I/O loop both expect to take over the thread, but they obviously can't both d that.

The solution is to make one loop drive the other. If either framework has a way to run one iteration of the main loop manually, instead of just running forever, you can, with a bit of care, put one in charge of the other. (Even if your framework doesn't have a way to do that, it may have a way to fake it by running an event loop and immediately posting a quit event; Tkinter can handle that.)

And the work may have already been done for you. Twisted is a networking framework that can work with most popular GUI frameworks. Qt is a GUI framework with a (somewhat limited) built-in network framework. They both have pretty high learning curves compared to Tkinter, but it's probably easier to learn one of them than to try to integrate, say, Tkinter and a custom select reactor yourself.

Another option is a hybrid approach: Do your GUI stuff in the main thread, and your I/O in a second thread. Both of them can still be callback-driven, and you can localize all of the threading problems to the handful of places where the two have to interact with each other.

Threading

With multithreading, we don't have to reorganize our code at all, we just move all of the work onto a thread:

    def handle_click():
        def callback():
            total = sum(100000000)
            print(total)
        t = threading.Thread(target=callback)
        t.start()

This kicks off the work in a background thread, which won't interfere with the main thread, and then returns immediately. And, not only is it simpler, you don't have to try to guess how finely to break up your tasks; the OS thread scheduler just magically takes care of it for you. So all is good.

Plus, this works just as well for I/O as it does for computation (better, in fact):

    def handle_click():
        def callback():
            r = urlopen('http://example.com')
            data = r.read()
            soup = BeautifulSoup(data)
            print(soup.find('p').text)
        t = threading.Thread(target=callback)
        t.start()

But what if we want it to interfere with the main thread? Then we have a problem. And with most frameworks--including Tkinter--calling any method on any GUI widget interferes with the main thread. For example, what we really wanted to do was this:

    def handle_click():
        def callback():
            total = sum(100000000)
            label.config(text=total)
        t = threading.Thread(target=callback)
        t.start()

But if we try that, it no longer works. (Or, worse, depending on your platform/version, it often works but occasionally crashes...)

So, we need some way to let the background thread work with the GUI.

on_main_thread

If you had a function on_main_thread that could be called on any thread, with any function, and get it to run on the main thread as soon as possible, this would be easy to solve:

    def handle_click():
        def callback():
            total = sum(100000000)
            root.on_main_thread(lambda: label.config(text=total))
        t = threading.Thread(target=callback)
        t.start()

Many GUI frameworks do have such a function. Tkinter, unfortunately, does not.

If you want to, you can pretty easily wrap up all of your widgets with proxy objects that forward method calls through on_main_thread, like this:

    class ThreadedMixin:
        main_thread = current_thread()
        def _forward(self, func, *args, **kwargs):
            if current_thread() != ThreadedMixin.main_thread:
                self.on_main_thread(lambda: func(*args, **kwargs))
            else:
                func(*args, **kwargs)

    class ThreadSafeLabel(Label, ThreadedMixin):
        def config(self, *args, **kwargs):
            self._forward(super().config, args, kwargs)
        # And so on for the other methods

Obviously you'd want do this programmatically or dynamically instead of writing hundreds of lines of forwarding code.

post_event

If you had a function post_event that could be called on any thread to post a custom event to the event queue, you could get the same effect with just a bit of extra work--just write an event handler for that custom event. For example:

    def handle_my_custom_event(event):
        label.config(text=event.message)
    root.register_custom_event('<My Custom Event>')
    root.bind('<My Custom Event>', handle_custom_event)

    def handle_click():
        def callback():
            total = sum(100000000)
            event = Event('<My Custom Event>', data=total)
            root.post_event(event)

Most GUI frameworks that don't have on_main_thread have post_event. But Tkinter doesn't even have that.

Polling queues

With limited frameworks like Tkinter, the only workaround is to use a Queue, and make Tkinter check the queue every so often, something like this:

    q = queue.Queue()

    def on_main_thread(func):
        q.put(func)

    def check_queue():
        while True:
            try:
                task = q.get(block=False)
            except Empty:
                break
            else:
                root.after_idle(task)
        root.after(100, check_queue)

    root.after(100, check_queue)

While this works, it makes the computer waste effort constantly checking the queue for work to do. This isn't likely to slow things down when your program is busy--but it will make it drain your battery and prevent your computer from going to sleep even when your program has nothing to do. Programs that use a mechanism like this will probably want some way to turn check_queue on and off, so it's only wasting time when you actually have some background work going.

mtTkinter

There's a wrapper around Tkinter called mtTkinter that effectively builds on_main_thread out of something like check_queue, and then builds thread-safe proxies around all of the Tkinter widgets, so you can use Tkinter as if it were completely thread-safe.

I don't know whether it's really "production-quality". I believe it hasn't been ported to Python 3 either. (2to3 might be enough, but I can't promise that.) And the LGPL licensing may be too restrictive for some projects. But for learning purposes, and maybe for building simple GUIs for your own use, it's worth looking at.

I have a quick&dirty port to 3.x on GitHub if you want to try it.

Threading limits

Unlike callbacks, if you pile up too many threads, you start adding additional overhead, in both time and space, on top of the cost of actually doing the work.

The solution to this is to use a small pool of threads to service a queue of tasks. The easiest way to do this is with the futures module:

    executor = ThreadPoolExecutor(8)

    def handle_click():
        def callback():
            total = sum(100000000)
            root.on_main_thread(lambda: label.config(text=total))
        executor.submit(callback)

Shared data

The biggest problem with threads is that any shared data needs to be synchronized, or you have race conditions. The general problem, and the solutions, are covered well all over the net.

But GUI apps add an additional problem: Your main thread can't block on a synchronization object that could be held for more than a fraction of a second, or your whole GUI freezes up. So, you need to make sure you never wait on a sync object for more than a brief time (either by making sure nobody else can hold the object for too long, or by using timeouts and retries).
21

View comments

It's been more than a decade since Typical Programmer Greg Jorgensen taught the word about Abject-Oriented Programming.

Much of what he said still applies, but other things have changed. Languages in the Abject-Oriented space have been borrowing ideas from another paradigm entirely—and then everyone realized that languages like Python, Ruby, and JavaScript had been doing it for years and just hadn't noticed (because these languages do not require you to declare what you're doing, or even to know what you're doing). Meanwhile, new hybrid languages borrow freely from both paradigms.

This other paradigm—which is actually older, but was largely constrained to university basements until recent years—is called Functional Addiction.

A Functional Addict is someone who regularly gets higher-order—sometimes they may even exhibit dependent types—but still manages to retain a job.

Retaining a job is of course the goal of all programming. This is why some of these new hybrid languages, like Rust, check all borrowing, from both paradigms, so extensively that you can make regular progress for months without ever successfully compiling your code, and your managers will appreciate that progress. After all, once it does compile, it will definitely work.

Closures

It's long been known that Closures are dual to Encapsulation.

As Abject-Oriented Programming explained, Encapsulation involves making all of your variables public, and ideally global, to let the rest of the code decide what should and shouldn't be private.

Closures, by contrast, are a way of referring to variables from outer scopes. And there is no scope more outer than global.

Immutability

One of the reasons Functional Addiction has become popular in recent years is that to truly take advantage of multi-core systems, you need immutable data, sometimes also called persistent data.

Instead of mutating a function to fix a bug, you should always make a new copy of that function. For example:

function getCustName(custID)
{
    custRec = readFromDB("customer", custID);
    fullname = custRec[1] + ' ' + custRec[2];
    return fullname;
}

When you discover that you actually wanted fields 2 and 3 rather than 1 and 2, it might be tempting to mutate the state of this function. But doing so is dangerous. The right answer is to make a copy, and then try to remember to use the copy instead of the original:

function getCustName(custID)
{
    custRec = readFromDB("customer", custID);
    fullname = custRec[1] + ' ' + custRec[2];
    return fullname;
}

function getCustName2(custID)
{
    custRec = readFromDB("customer", custID);
    fullname = custRec[2] + ' ' + custRec[3];
    return fullname;
}

This means anyone still using the original function can continue to reference the old code, but as soon as it's no longer needed, it will be automatically garbage collected. (Automatic garbage collection isn't free, but it can be outsourced cheaply.)

Higher-Order Functions

In traditional Abject-Oriented Programming, you are required to give each function a name. But over time, the name of the function may drift away from what it actually does, making it as misleading as comments. Experience has shown that people will only keep once copy of their information up to date, and the CHANGES.TXT file is the right place for that.

Higher-Order Functions can solve this problem:

function []Functions = [
    lambda(custID) {
        custRec = readFromDB("customer", custID);
        fullname = custRec[1] + ' ' + custRec[2];
        return fullname;
    },
    lambda(custID) {
        custRec = readFromDB("customer", custID);
        fullname = custRec[2] + ' ' + custRec[3];
        return fullname;
    },
]

Now you can refer to this functions by order, so there's no need for names.

Parametric Polymorphism

Traditional languages offer Abject-Oriented Polymorphism and Ad-Hoc Polymorphism (also known as Overloading), but better languages also offer Parametric Polymorphism.

The key to Parametric Polymorphism is that the type of the output can be determined from the type of the inputs via Algebra. For example:

function getCustData(custId, x)
{
    if (x == int(x)) {
        custRec = readFromDB("customer", custId);
        fullname = custRec[1] + ' ' + custRec[2];
        return int(fullname);
    } else if (x.real == 0) {
        custRec = readFromDB("customer", custId);
        fullname = custRec[1] + ' ' + custRec[2];
        return double(fullname);
    } else {
        custRec = readFromDB("customer", custId);
        fullname = custRec[1] + ' ' + custRec[2];
        return complex(fullname);
    }
}

Notice that we've called the variable x. This is how you know you're using Algebraic Data Types. The names y, z, and sometimes w are also Algebraic.

Type Inference

Languages that enable Functional Addiction often feature Type Inference. This means that the compiler can infer your typing without you having to be explicit:


function getCustName(custID)
{
    // WARNING: Make sure the DB is locked here or
    custRec = readFromDB("customer", custID);
    fullname = custRec[1] + ' ' + custRec[2];
    return fullname;
}

We didn't specify what will happen if the DB is not locked. And that's fine, because the compiler will figure it out and insert code that corrupts the data, without us needing to tell it to!

By contrast, most Abject-Oriented languages are either nominally typed—meaning that you give names to all of your types instead of meanings—or dynamically typed—meaning that your variables are all unique individuals that can accomplish anything if they try.

Memoization

Memoization means caching the results of a function call:

function getCustName(custID)
{
    if (custID == 3) { return "John Smith"; }
    custRec = readFromDB("customer", custID);
    fullname = custRec[1] + ' ' + custRec[2];
    return fullname;
}

Non-Strictness

Non-Strictness is often confused with Laziness, but in fact Laziness is just one kind of Non-Strictness. Here's an example that compares two different forms of Non-Strictness:

/****************************************
*
* TO DO:
*
* get tax rate for the customer state
* eventually from some table
*
****************************************/
// function lazyTaxRate(custId) {}

function callByNameTextRate(custId)
{
    /****************************************
    *
    * TO DO:
    *
    * get tax rate for the customer state
    * eventually from some table
    *
    ****************************************/
}

Both are Non-Strict, but the second one forces the compiler to actually compile the function just so we can Call it By Name. This causes code bloat. The Lazy version will be smaller and faster. Plus, Lazy programming allows us to create infinite recursion without making the program hang:

/****************************************
*
* TO DO:
*
* get tax rate for the customer state
* eventually from some table
*
****************************************/
// function lazyTaxRateRecursive(custId) { lazyTaxRateRecursive(custId); }

Laziness is often combined with Memoization:

function getCustName(custID)
{
    // if (custID == 3) { return "John Smith"; }
    custRec = readFromDB("customer", custID);
    fullname = custRec[1] + ' ' + custRec[2];
    return fullname;
}

Outside the world of Functional Addicts, this same technique is often called Test-Driven Development. If enough tests can be embedded in the code to achieve 100% coverage, or at least a decent amount, your code is guaranteed to be safe. But because the tests are not compiled and executed in the normal run, or indeed ever, they don't affect performance or correctness.

Conclusion

Many people claim that the days of Abject-Oriented Programming are over. But this is pure hype. Functional Addiction and Abject Orientation are not actually at odds with each other, but instead complement each other.
5

View comments

Blog Archive
About Me
About Me
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.