1. There are a number of blogs out there that tackle the problems of callbacks for servers, or for Javascript, but novices trying to write Python GUIs shouldn't have to learn about the different issues involved in servers, or a whole different language.

    In another post, I showed the two major approaches to writing asynchronous GUI code: threading and callbacks. Both have drawbacks. But there are a number of techniques to get many of the advantages of threads, on top of callbacks. So, to some extent, you can get (part of) the best of both worlds.

    Most of these techniques come from the world of network servers. The central issue facing network servers is the same as GUIs--your application has to be written as a bunch of event handlers that can't block or take a long time. But the practical details can be pretty different.

    Update #1: The original version of this made it sound as if async-style coroutines for GUIs would be the ideal solution to the problem. They wouldn't, they'd just be the best solution we could have in Python as the language is today. So I added a new section at the end.

    Update #2: Callbacks as our Generations' Go To Statement by Miguel de Icaza is the best description I've seen so far of what's wrong with callback hell, and how to fix it. He's coming from a .NET (C#/F#) perspective, but (exaggerating only slightly) C# await/async is exactly how you'd design Python's coroutines and async module if you didn't already have generators, so it's worth reading.

    Where we left off

    For reference, here's a simple example in a synchronous version (which blocks the GUI unacceptably), a callback version (which is ugly and easy to get wrong), and a threaded version (which fails because it tries to access widgets from outside the main thread):

        def handle_click_sync():
            total = sum(range(1000000000))
            label.config(text=total)
    
        def handle_click_callback():
            total = 0
            i = 0
            def callback():
                nonlocal i, total
                total += sum(range(i*1000000, (i+1)*1000000))
                i += 1
                if i == 100:
                    label.config(text=total)
                else:
                    root.after_idle(callback)
            root.after_idle(callback)
    
        def handle_click_threads():
            def callback():
                total = sum(100000000)
                label.config(text=total)
            t = threading.Thread(target=callback)
            t.start()
    

    Ideally, we want to get something that looks as nice as the threading example, but that actually works.

    Promises

    The simplest solution to avoid callback hell without using threads is to use promises, aka deferreds, objects that wrap up callback chains to make them easier to use.

    While this idea came out of Python, where it's really taken off is Javascript. Partly this is because web browsers provide a callback-based API for both the GUI and I/O, and the language doesn't have any of the nice features Python has that make some of the later options possible.

    The Twisted documentation explains the idea better than I could. If you can follow Javascript, there are also 69105 good tutorials you can find with a quick Google.

    So, I'll just show an example (with a fictitious API, since none of the major GUI frameworks are built around promises):

        def handle_click():            
            d = Deferred()
            for i in range(100):
                def callback(subtotal):
                    d = Deferred()
                    d.resolve(subtotal + sum(range(i*1000000, (i+1)*1000000)))
                    return d
                d.addCallback(lambda: root.after_idle(callback))
            d.addCallback(lambda: label.config(text=total))
            return d
    

    While it's not quite as nice as the threaded version, it's similar--we don't have to turn the control flow inside-out, we don't have to worry about getting lost in the chain, exceptions will be propagated automatically and with useful tracebacks, and so on.

    Also, notice how easy it would be to write a wrapper that maps a callback over an iterable, chaining the calls in series. Or various other ways of combining things. Again, if you can read Javascript, you can find some good examples of these kinds of functions online by searching, e.g, "deferred map".

    You can also probably tell that Deferreds aren't that heavy of a wrapper around callbacks, so it's always pretty easy to understand what's happening under the covers. In practice, that isn't useful as often as it sounds--but it's not completely useless, either. But, more importantly, this means it's very easy to wrap a callback-based API in a promise-based API. Again, if you can read Javascript, you can find some "promisify" implementations and see how they work.

    Yielding control

    Some GUI frameworks have a way to yield control to the event loop, then resume in-place. A few platforms have native support for this kind of thing; on other platforms, they generally fake it on top of one of the other techniques below, but you can assume they did so in a way that makes sense and just as it.

    Tkinter doesn't have such a mechanism, but I'll pretend that it had a function like wx's Yield. Then we could do this:

        def handle_click():
            total = 0
            for i in range(100):
                total += sum(range(i*1000000, (i+1)*1000000))
                root.yield_control(only_if_needed=True)
            label.config(text=total)
    

    We did have to break the computation up into steps so we could yield every so often, but other than that, we didn't have to change the code at all. That's almost as simple as the threaded code, and without any of the drawbacks. However, yielding like this has some problems of its own.

    Imagine how you'd write the sleep example from the previous post with this feature. There's no way to yield control for one second, at least not without an additional yield_sleep method. And likewise, there's no yield_read or yield_urlopen. So, this only really works for the simplest cases. While it wouldn't be impossible for some framework to include functions for all of these other cases, it would be infeasible.

    Nested event loops

    If you manually nest one event loop inside another, you can effectively break up your sequential code just by running a single step of the event loop, or a short-lived loop, every so often. Effectively, this gives you a way to write a yield_control function even if one doesn't exist.

    However, this has the same problems as yield_control, plus some additional problems. If the user clicks the button (or triggers some other handler) while we're in the middle of processing the first press, we'll end up calling one event handler in the middle of another. This means your code has to all be reentrant.

    On top of that, an impatient user, or just a flood of mouse-over events, could push you over the recursion limit.

    At any rate, while this is doable with many GUI frameworks, including (with a bit of work) Tkinter, it's usually not the best solution.

    Greenlets

    Greenlets, aka fibers, are cooperative threads. They have an API similar to real threading, except that you have to explicitly tell them when to give control to another thread.

    The code looks like a cross between threading code and yielding code. Using a fictitious greenlet API that lets the Tkinter root object also work as a greenlet controller:

        def handle_click():
            def callback():
                total = 0
                for i in range(100):
                    root.switch()
                    total += sum(range(i*1000000, (i+1)*1000000))
                label.config(text=total)
            t = root.new_greenlet(target=callback)
            root.switch(t)
    

    One advantage of greenlets is that they allow you to compose low-level async functions into higher-level async functions by just spawning a greenlet that calls the low-level functions. So, once you've got a library like gevent for the low-level async I/O, it's almost trivial to wrap up even something as complicated as requests, as you can see in the source to the grequests wrapper. And then we can use it like this:

        def handle_click():
            def callback():
                r = grequests.get('http://example.com')
                soup = BeautifulSoup(r.text)
                label.config(text=soup.find('p').text)
            t = root.new_greenlet(target=callback)
            root.switch(t)
    

    (In fact, gevent itself wraps up large chunks of the stdlib this way, including urlopen, so we didn't even really need this example, but it seemed worth showing anyway.)

    The big problem is that greenlets have to be integrated with your event loop framework. And, while there are greenlet-based networking libraries like gevent, there is no greenlet-based GUI library.

    If your GUI framework has a way to manually run one iteration of the event loop at a time, you can run the event loop itself inside a greenlet, and then drive it with any greenlet-based event loop, like gevent, or (if you don't need any async I/O) just a trivial scheduler loop. This can be pretty handy with a game framework like pygame. It's rarely used with GUI frameworks like Tkinter, but there's no reason it couldn't be.

    Coroutines

    You can build coroutines out of Python generators, which allow you to get most of the benefits of greenlets without actually needing greenlets. Instead of switching to another greenlet, you yield from another coroutine (or a future object that wraps up the result of a coroutine). Greg Ewing has a great presentation on this.

    The code for a hypothetical coroutine-based GUI framework might look like this:

        def handle_click():
            total = 0
            for i in range(100):
                yield from sleep(0)
                total += sum(range(i*1000000, (i+1)*1000000))
            label.config(text=total)
    

    But there is no such framework.

    Also, integrating a coroutine-based GUI framework with an async I/O framework (except by using the hybrid approach of having separate threads for each) looks like a hard problem that nobody's even thought through, much less solved.

    Coroutines over callbacks

    Instead of building a coroutine scheduler, you can build a coroutine API on top of explicit callbacks. PEP 3156 will add this to the standard library in some future version, and there are already third-party frameworks like Twisted's inlineCallbacks, or Monocle.

    The code looks the same as the above, except that you generally need some kind of decorator to mark your coroutines. So:

        @coroutine
        def handle_click():
            total = 0
            for i in range(100):
                yield from sleep(0)
                total += sum(range(i*1000000, (i+1)*1000000))
            label.config(text=total)
    

    And, while integrating a GUI framework with an async I/O framework is still a problem, doing so at the callback level is a solved problem. Some post-PEP 3156 project may add it to the standard library, and Twisted already integrates with a variety of GUI frameworks, including Tkinter.

    With a fictitious API to match the previous example:

        @coroutine
        def handle_click():
            r = yield from urlopen('http://example.com')
            data = yield from r.read()
            soup = BeautifulSoup(data)
            label.config(text=soup.find('p').text)
    

    Notice that with greenlets, async operations didn't require any kind of marking; you only needed to call switch when you wanted to give up control without waiting on an async operation. With coroutines, you mark both cases with an explicit yield from. Is this extra noise in the coroutine code, or is it a case of "explicit is better than implicit"? That's a matter of opinion, or at least debate.

    Implicit coroutines, implicit futures, dataflow variables, …

    Async-module-style coroutines are pretty cool, but there's a problem. As soon as I decide to make some previously-synchronous function asynchronous (e.g., maybe I had a function that returns a quick string, but then I realized that sometimes it needs to load it off disk over a network share), every function that calls it has to now yield from it—and, of course, has to become a coroutine itself, meaning every function that called it has to adjust, and so on up the chain. It's certainly better than manually pushing callbacks up the chain and figuring out how to propagate errors and so forth, but… why can't the compiler or interpreter do this for us?

    The interpreter can tell that I'm calling an async coroutine without yield from and complain at me. Could it just automatically do a yield from and turn my function into a coroutine (the same way adding any yield or yield from turns it into a generator) instead, and push that all the way up the chain? My top-level function would need to be changed manually the first time you use an async coroutine anywhere in the code, but that's it.

    Unfortunately, that doesn't fit into the model that Python uses. But I don't see why you couldn't design a language that works that way.

    But once you're doing that, there may be an easier solution. Python's futures are, while a higher-level abstraction than callbacks, still lower-level than you'd want. First, they're explicit—you can't just use a future string as a string, you have to ask it for its value and then use the result as a string. And there's a reason for that: asking for the value is a blocking operation. Which means you usually don't ask for the value; you attach a callback, and its runloop calls your callback. The key to the coroutine design is that it effectively lets you yield from a future's value instead of attaching a callback. So, what if there were no implicit yield-from-ing, and maybe even no explicit yield-from-ing, and instead, accessing a future's value automatically turned your function into a coroutine and blocked it until the value was ready? That would get you 90% of the benefit of implicit coroutines without the problems. And then, of course, you could make your futures implicit—a future string is just a string, but the first time you try to use it, you block until it's ready. (This does mean that some cases, like waiting with a timeout, need some other mechanism, but that's pretty easy—allow you to use any value as if it were a future; print('s: {}'.format(timeout(s, 5.0))) if s is a future string waits up to 5 seconds for s and raises if not ready; if s is a plain old string, it just returns immediately. This is effectively the way Alice ML and a few other research languages approach the async problem.

    At first glance, it looks like greenlet libraries like gevent give you the same benefits, but that's misleading. They have to patch all of the low-level synchronous methods with more complicated explicit greenlet-blocking code. Often those low-level primitives can be composed without thinking about it, but, unlike futures, that's not universally true. And if you want to add some new functionality, you can't just return a future, you have to do the same work that gevent does to wrap up the new primitives.

    And you can simplify things even more than implicit futures with dataflow variables. In Python, a variable is just a binding from a name for an object. But in Oz, this is split into two parts, which allows you to create structures that contain unbound variables and then bind those partially-bound structures to variables. I won't get into all of the differences this makes, because if you haven't read CTM and you're serious about programming, you should stop whatever you're doing and go read it, but there's one important benefit here: The system needs some strategy for dealing with access to unbound variables, and one such strategy is just to block until the variable is bound. This makes everything an implicit future, and it leaves it up to the language to schedule things properly (when combined with greenlets, this gives you the same kind of magic that gevent had to do explicitly, but for free), but also makes everything available to the program when it matters.

    One last thing: Sometimes, to understand what's really happening, it helps to think in terms of continuations. A continuation is just the rest of the program after a given point. A few languages (notably Scheme and its descendants) let you explicitly ask for a continuation, pass it around, and continue it. This is primarily useful for building coroutines (which we can already do in Python without needing to think about continuations). When you take all of the code from your function after a blocking call, plus a callback that was passed in, and wrap it all up in a callback function that you then pass to that blocking call, what you're doing is recreating continuations explicitly and at a higher level. If you could just get the current continuation and pass it, you could then write the rest of your function synchronously, and then continue the passed-in continuation at the end, without needing to wrap anything up in a callback function. So far, that still sounds less user-friendly than yield from coroutines, because you have to explicitly accept and continue a continuation. But that part is exactly the part we're trying to automate away. So, imagine a language that allowed you to mark primitives with @blocking, which would give them a hidden extra parameter for a continuation, and turn every return into a continue. And then, calling a @blocking function would get the current continuation and pass it as that hidden parameter. And that's all you'd have to do. That's exactly what both implicit continuations and implicit futures are trying to accomplish, but they're attacking it at a higher level. That may be worth doing, because normally-implicit continuations and futures that you can access explicitly when you want to are useful programming abstractions in their own right. But attacking it at the lowest level may make the problem easier to solve.

    At any rate, none of the ideas in this section would fit into Python (even if most of them weren't rambling messages that may fall off the thin line between too much detail and inaccuracy for the sake of simplicity…), but they're all worth keeping in your head when thinking about what you could do for Python, and how.
    0

    Add a comment

  2. Imagine a simple Tkinter app. (Everything is pretty much the same for most other GUI frameworks, and many frameworks for games and network servers, and even things like SAX parsers, but most novices first run into this with GUI apps, and Tkinter is easy to explore because it comes with Python.)

        def handle_click():
            print('Clicked!')
        root = Tk()
        Button(root, text='Click me', command=handle_click).pack()
        root.mainloop()
    

    Now imagine that, instead of just printing a message, you want it to pop up a window, wait 5 seconds, then close the window. You might try to write this:

        def handle_click():
            win = Toplevel(root, title='Hi!')
            win.transient()
            Label(win, text='Please wait...').pack()
            for i in range(5, 0, -1):
                print(i)
                time.sleep(1)
            win.destroy()
    

    But when you click the button, the window doesn't show up. And the main window freezes up and beachballs for 5 seconds.

    This is because your event handler hasn't returned, so the main loop can't process any events. It needs to process events to display a new window, respond to messages from the OS, etc., and you're not letting it.

    There are two basic ways around this problem: callbacks, or threads. There are advantages and disadvantages of both. And then there are various ways of building thread-like functionality on top of callbacks, which let you get (part of) the best of both worlds, but I'll get to those in another post.

    Callbacks

    Your event handler has to return in a fraction of a second. But what if you still have code to run? You have to reorganize your code: Do some setup, then schedule the rest of the code to run later. And that "rest of the code" is also an event handler, so it also has to return in a fraction of a second, which means often it will have to do a bit of work and again schedule the rest to run later.

    Depending on what you're trying to do, you may want to run on a timer, or whenever the event loop is idle, or every time through the event loop no matter what. In this case, we want to run once/second. In Tkinter, you do this with the after method:

        def handle_click():
            win = Toplevel(root, title='Hi!')
            win.transient()
            Label(win, text='Please wait...').pack()
            i = 5
            def callback():
                nonlocal i, win
                print(i)
                i -= 1
                if not i:
                    win.destroy()
                else:
                    root.after(1000, callback)
            root.after(1000, callback)
    

    For a different example, imagine we just have some processing that takes a few seconds because it has so much work to do. We'll do something stupid and simple:

        def handle_click():
            total = sum(range(1000000000))
            label.config(text=total)
    
        root = Tk()
        Button(root, text='Add it up', command=handle_click).pack()
        label = Label(root)
        label.pack()
        root.mainloop()
    

    When you click the button, the whole app will freeze up for a few seconds as Python calculates that sum. So, what we want to do is break it up into chunks:

        def handle_click():
            total = 0
            i = 0
            def callback():
                nonlocal i, total
                total += sum(range(i*1000000, (i+1)*1000000))
                i += 1
                if i == 100:
                    label.config(text=total)
                else:
                    root.after_idle(callback)
            root.after_idle(callback)

    Callback Hell

    While callbacks definitely work, there are a lot of probems with them.

    First, we've turned out control flow inside-out. Compare the simple for loop to the chain of callbacks that replaced it. And it gets much worse when you have more complicated code.

    On top of that, it's very easy to get lost in a callback chain. If you forget to return from a sequential function, you'll just fall off the end of the function and return None. If you forget to schedule the next callback, the operation never finishes.

    It's also hard to propagate results through a chain of callbacks, and even harder to propagate errors. Imagine that one callback needs to schedule a second which needs to schedule a third and so on--but if there's an exception anywhere on the chain, you want to jump all the way to the last callback. Think about how you'd write that (or just look at half the Javascript apps on the internet, which you can View Source for), and how easy it would be to get it wrong.

    And debugging callback-based code is also no fun, because the stack traceback doesn't show you the function that scheduled you to run later, it only shows you the event loop.

    There are solutions to these problems, which I'll cover in another post. But it's worth writing an app or two around explicit callbacks, and dealing with all the problems, so you can understand what's really involved in event-loop programming.

    Blocking operations

    Sleeping isn't the only thing that blocks. Imagine that you wanted to read a large file off the disk, or request a URL over the internet. How would you do that with callbacks?

    We had to replace our sleep with a call to after, passing it the rest of our function as a callback. Similarly, we have to replace our read or urlopen with a call to some function that kicks off the work and then calls our callback when it's done. But most GUI frameworks don't have such functions. And you don't want to try to build something like that yourself.

    I/O isn't the only kind of blocking, but it's by far the most common. And there's a nice solution to blocking I/O: asynchronous I/O, using a networking framework. Whether this is as simple as a loop around select or as fancy as Twisted, the basic idea is the same as with a GUI: it's an event loop that you add handlers to.

    And there's the problem: your GUI loop and your I/O loop both expect to take over the thread, but they obviously can't both d that.

    The solution is to make one loop drive the other. If either framework has a way to run one iteration of the main loop manually, instead of just running forever, you can, with a bit of care, put one in charge of the other. (Even if your framework doesn't have a way to do that, it may have a way to fake it by running an event loop and immediately posting a quit event; Tkinter can handle that.)

    And the work may have already been done for you. Twisted is a networking framework that can work with most popular GUI frameworks. Qt is a GUI framework with a (somewhat limited) built-in network framework. They both have pretty high learning curves compared to Tkinter, but it's probably easier to learn one of them than to try to integrate, say, Tkinter and a custom select reactor yourself.

    Another option is a hybrid approach: Do your GUI stuff in the main thread, and your I/O in a second thread. Both of them can still be callback-driven, and you can localize all of the threading problems to the handful of places where the two have to interact with each other.

    Threading

    With multithreading, we don't have to reorganize our code at all, we just move all of the work onto a thread:

        def handle_click():
            def callback():
                total = sum(100000000)
                print(total)
            t = threading.Thread(target=callback)
            t.start()
    

    This kicks off the work in a background thread, which won't interfere with the main thread, and then returns immediately. And, not only is it simpler, you don't have to try to guess how finely to break up your tasks; the OS thread scheduler just magically takes care of it for you. So all is good.

    Plus, this works just as well for I/O as it does for computation (better, in fact):

        def handle_click():
            def callback():
                r = urlopen('http://example.com')
                data = r.read()
                soup = BeautifulSoup(data)
                print(soup.find('p').text)
            t = threading.Thread(target=callback)
            t.start()
    

    But what if we want it to interfere with the main thread? Then we have a problem. And with most frameworks--including Tkinter--calling any method on any GUI widget interferes with the main thread. For example, what we really wanted to do was this:

        def handle_click():
            def callback():
                total = sum(100000000)
                label.config(text=total)
            t = threading.Thread(target=callback)
            t.start()
    

    But if we try that, it no longer works. (Or, worse, depending on your platform/version, it often works but occasionally crashes...)

    So, we need some way to let the background thread work with the GUI.

    on_main_thread

    If you had a function on_main_thread that could be called on any thread, with any function, and get it to run on the main thread as soon as possible, this would be easy to solve:

        def handle_click():
            def callback():
                total = sum(100000000)
                root.on_main_thread(lambda: label.config(text=total))
            t = threading.Thread(target=callback)
            t.start()
    

    Many GUI frameworks do have such a function. Tkinter, unfortunately, does not.

    If you want to, you can pretty easily wrap up all of your widgets with proxy objects that forward method calls through on_main_thread, like this:

        class ThreadedMixin:
            main_thread = current_thread()
            def _forward(self, func, *args, **kwargs):
                if current_thread() != ThreadedMixin.main_thread:
                    self.on_main_thread(lambda: func(*args, **kwargs))
                else:
                    func(*args, **kwargs)
    
        class ThreadSafeLabel(Label, ThreadedMixin):
            def config(self, *args, **kwargs):
                self._forward(super().config, args, kwargs)
            # And so on for the other methods
    

    Obviously you'd want do this programmatically or dynamically instead of writing hundreds of lines of forwarding code.

    post_event

    If you had a function post_event that could be called on any thread to post a custom event to the event queue, you could get the same effect with just a bit of extra work--just write an event handler for that custom event. For example:

        def handle_my_custom_event(event):
            label.config(text=event.message)
        root.register_custom_event('<My Custom Event>')
        root.bind('<My Custom Event>', handle_custom_event)
    
        def handle_click():
            def callback():
                total = sum(100000000)
                event = Event('<My Custom Event>', data=total)
                root.post_event(event)
    

    Most GUI frameworks that don't have on_main_thread have post_event. But Tkinter doesn't even have that.

    Polling queues

    With limited frameworks like Tkinter, the only workaround is to use a Queue, and make Tkinter check the queue every so often, something like this:

        q = queue.Queue()
    
        def on_main_thread(func):
            q.put(func)
    
        def check_queue():
            while True:
                try:
                    task = q.get(block=False)
                except Empty:
                    break
                else:
                    root.after_idle(task)
            root.after(100, check_queue)
    
        root.after(100, check_queue)
    

    While this works, it makes the computer waste effort constantly checking the queue for work to do. This isn't likely to slow things down when your program is busy--but it will make it drain your battery and prevent your computer from going to sleep even when your program has nothing to do. Programs that use a mechanism like this will probably want some way to turn check_queue on and off, so it's only wasting time when you actually have some background work going.

    mtTkinter

    There's a wrapper around Tkinter called mtTkinter that effectively builds on_main_thread out of something like check_queue, and then builds thread-safe proxies around all of the Tkinter widgets, so you can use Tkinter as if it were completely thread-safe.

    I don't know whether it's really "production-quality". I believe it hasn't been ported to Python 3 either. (2to3 might be enough, but I can't promise that.) And the LGPL licensing may be too restrictive for some projects. But for learning purposes, and maybe for building simple GUIs for your own use, it's worth looking at.

    I have a quick&dirty port to 3.x on GitHub if you want to try it.

    Threading limits

    Unlike callbacks, if you pile up too many threads, you start adding additional overhead, in both time and space, on top of the cost of actually doing the work.

    The solution to this is to use a small pool of threads to service a queue of tasks. The easiest way to do this is with the futures module:

        executor = ThreadPoolExecutor(8)
    
        def handle_click():
            def callback():
                total = sum(100000000)
                root.on_main_thread(lambda: label.config(text=total))
            executor.submit(callback)
    

    Shared data

    The biggest problem with threads is that any shared data needs to be synchronized, or you have race conditions. The general problem, and the solutions, are covered well all over the net.

    But GUI apps add an additional problem: Your main thread can't block on a synchronization object that could be held for more than a fraction of a second, or your whole GUI freezes up. So, you need to make sure you never wait on a sync object for more than a brief time (either by making sure nobody else can hold the object for too long, or by using timeouts and retries).
    21

    View comments

  3. So you've installed Python from an official binary installer on python.org's Releases page, you've installed Xcode from the App Store and the Command Line Tools from Xcode, you've installed pip from its setup script. And now, you try to "pip-X.Y install pyobjc" and it fails with a whole slew of obscure error messages. 

    An easy workaround: Don't

    The official binary installer seems like the easy way to do things, but it's not. It's built to work with every version of OS X from 10.6 to 10.9. This means whenever you build a package, it will try to build that package to work with every version of OS X from 10.6 to 10.9. This is very hard to do—especially on a 10.8 or 10.9 machine with Xcode 5.

    If you're planning to build applications for binary distribution with, e.g., py2app, and you want them to work on an older version of OS X than you have, then you need to get this working. (Although even then, you might be better off building Python exactly the way you want, instead of using the binary installation.) So far, I haven't been able to get this working with Xcode 5; I've been using an old machine that I don't update.

    For almost everyone else, it's unnecessary wasted effort.

    Python 2

    If you're using Python 2, just stick with Apple's pre-installed 2.7.2. Having multiple 2.7 installations at the same time is already a huge headache, and the added problems with building packages… is it really worth it?

    Python 3

    While it may seem counter-intuitive, building it yourself makes everything easier, because you end up with a Python installation tailored to your build toolchain, not to the Python Mac build machine's toolchain.

    And if you use Homebrew, building it yourself is just "brew install python3". Plus, you get setuptools and pip (that work with your system), and a newer sqlite3, real readline, gdbm, and a few other things you wouldn't have thought of.

    When are they going to fix it?

    I know that the pythonmac SIG are aware of the problem. In fact, the problem has been around for a long time; it's just that the workarounds they've used since 10.6 no longer work. I have no idea what they're planning to do about it. You might want to watch the pythonmac-sig mailing list for progress, or join in to help.

    The problem

    There are actually two problems.

    gcc is gone

    The official Python.org binaries are built with Apple's custom gcc-4.2, as supplied by Xcode 3.2.

    Xcode 4 stopped supplying gcc-4.2, but offered a transitional compiler called llvm-gcc-4.2 (because it used a custom gcc-4.2 frontend hooked up to the llvm backend), and the toolchain came with wrappers named things like "gcc-4.2" and "g++-4.2" and so on. This actually had some problems building Python itself, but for building extension modules—even complex ones like numpy and pyobjc—you usually got away with it.

    Xcode 5 dropped llvm-gcc-4.2 as well. Now, all you've got is clang. And, while "gcc" is a wrapper around clang, "gcc-4.2" does not exist at all. So, many extensions will just fail to build, because they're looking for a compiler named "gcc-4.2" (or a C++ compiler named "g++-4.2", or a linker frontend named "gcc-4.2", or…). The new compiler—which Apple calls "Apple LLVM 5.0 (clang-500.2.76) (based on LLVM 3.3svn)", just to make it impossible for anyone to refer to—does a much better job than llvm-gcc-4.2; if you can just get distutils to use it everywhere, everything pretty much just works.

    In some cases, just passing "CC=clang CXX=clang++" environment variables to the build will work. You can get further by also adding "MAINCC=clang LINKCC=clang". Anything that needs to run a configure script will _still_ end up picking up gcc-4.2, however, and there may be similar issues with projects that first build a local distutils.cfg or similar.

    One workaround is to edit  /Library/Frameworks/Python.framework/Versions/X.Y/lib/pythonX.Y/config-X.Ym/Makefile to fix all references to gcc and g++ to instead reference clang and clang++, then cross your fingers. This seems to work.

    Alternatively, you could create a symlink, or a hardlink, from /usr/bin/gcc to /usr/local/bin/gcc-4.2, and likewise for g++, and cross your fingers even tighter. I haven't tried this.

    10.8 is the oldest SDK

    We've always been at war with Eastasia, and we've always been compiling for 10.8. There has never been an older SDK. References in your configure scripts to MacOS10.6.sdk are errors.

    Many extensions will build just fine without the 10.6 SDK—but they'll quietly build for your native system, which defeats the purpose of building a redistributable application.

    You can still find the 10.6 and 10.7 SDKs in older Xcode packages from Apple (and, for 10.7, you can download the latest Command Line Tools for Lion, which is just the SDK slightly repackaged). Then you can copy them into /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/ and… whether they'll actually work, I don't know. They won't have an SDKSettings.plist file. They won't be registered in the list of known SDKs; the GUI and xcodebuild certainly won't find them, but maybe specifying them on the command line will work. Or maybe only if you use absolute paths.

    0

    Add a comment

Blog Archive
About Me
About Me
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.