1. Let's say you have a network server built around a single-threaded event loop—gevent, Twisted, asyncore, something simple you built on top of select.select, whatever. It's important to handle every event quickly. If you take 30 seconds responding to a requests from one connection, you can't deal with anyone else's requests, or accept new connections, for that whole 30 seconds.

    But what if the request actually takes 30 seconds' worth of work to handle?

    This is exactly what threads are for. You package up a job that does the 30 seconds of work and responds to the request when it's done, and kick it out of the event loop to run in parallel, then you can move on to the next event without waiting for it to finish.

    The same thing applies to GUI programs. Clicking a button can't stop the interface from responding for 30 seconds. So, if the button means there's 30 seconds of work to do, kick it out of the event loop.

    Thread pools

    The idea of a thread pool is pretty simple: Your code doesn't have to worry about finding a thread to use, or starting a thread and making sure there aren't too many, or anything like that. You just package up a job, and tell the pool to run it as soon as it can. If there's an idle thread, it'll run your job immediately. If not, your job will go into a queue to be run in its turn.

    You can build a thread pool pretty easily, but you don't have to, because Python has a really simple one in the standard library: concurrent.futures. (If you're using Python 3.1 or earlier, including 2.7, install and download the backport).

    Let's say you've built a server that looks like this:

        class FooHandler(BaseHandler):
            def ping(self, responder):
                responder('pong')
            def time(self, responder):
                responder(time.time())
            def add(self, x, y):
                responder(x+y)
            def sumrange(self, count):
                responder(sum(range(count)))
    

    The problem is that if someone calls sumrange with a count of 1000000000, that may take 30 seconds, and during those 30 seconds, your server is not listening to anyone else. So, how do we put that on a thread?

        class FooHandler(BaseHandler):
            def __init__(self):
                self.executor = ThreadPoolExecutor(max_workers=8)
            def ping(self, responder):
                responder('pong')
            def time(self, responder):
                responder(time.time())
            def add(self, x, y):
                responder(x+y)
            def sumrange(self, count):
                self.executor.submit(lambda: responder(sum(range(count))))
    

    That's all there is to it. Now, sumrange can take as long as it wants, and nobody else is blocked.

    Process pools

    What happens if two people call sumrange(1000000000) at the same time?

    Since each request gets a separate thread, and your machine probably has at least two cores, it should still take around 30 seconds to answer both of them, right?

    Not in Python. The GIL (Global Interpreter Lock) prevents two Python threads from doing CPU work at the same time. If your threads are doing more I/O than CPU—which is usually the case for servers—that's not a problem, but sumrange is clearly all CPU work. So, it'll take 60 seconds to answer both of them.

    Not only that, but in Python 2, there were some problems with the GIL that could cause it to take more than twice as long, and even to intermittently block up the main thread.

    What's the solution?

    Use processes instead of threads. There's more overhead to passing information between processes than between threads, but all we're passing here is an integer and some "responder" object that's probably ultimately just a simple wrapper around a socket handle.

    There are also restrictions on what you can pass. Depending on how that responder was coded, you may be able to pass it, you may not. If you can, the change is as easy as this:

        def __init__(self):
            self.executor = ProcessPoolExecutor(max_workers=8)
    

    If the responder object wasn't coded in a way that lets you pass it between processes, you'll get an error like this:

        _pickle.PicklingError: Can't pickle : attribute lookup function failed
    

    Futures

    So, what do you do when you get that error?

    If you can't give the background process the responder object, you need to have it return you the value, so you can call responder on it. But how do you do that?

    That "executor.submit" call actually returns something, called a future. We ignored it before, but it's exactly what we need now. A future represents a result that will exist later. In our case, the job is sum(range(count)), so the result is just the sum that the child process will pass back when it finishes all that work.

    What good is that?

    Well, you can do four basic things with a future:

    First, you can wait for it to finish by calling result(), but that brings you right back to the starting point—you sit around for 30 seconds blocking the event loop.

    You can wait on a batch of futures at once. That's pretty cool, because it means that a single thread could wait on the results of 1000 jobs run across 8 child processes. But you don't want to have to write that thread if you don't have to, right? Ideally, your event loop framework would know how to wait for futures the same way it waits for sockets. But, unless you're using the PEP 3156 prototype, yours probably doesn't.

    You can poll a future, or a batch of futures, to see if it's done. Which means you could just poll your whole list of futures each time through the event loop. But what if some poor user is waiting for his sum, and no other events happen for 10 minutes? So you need to add some kind of timeout or on_timer event or equivalent to make sure you keep checking at least, say, once/second if there are any futures to check.

    Finally, you can just attach a callback that will get run whenever the future is finished. Which… doesn't have any down side at all. That's all you want here: to run responder on the result whenever it's ready.

    This is a bit tricky to get your head around, but an example shows how simple it actually is:

        def sumrange(self, count):
            future = self.executor.submit(lambda: sum(range(count)))
            future.add_done_callback(lambda f: responder(f.result()))
    
    0

    Add a comment

  2. People want sorted collections in the stdlib, but no one has ever made a solid proposal to add them.

    Some people want a SortedList that maintains sorted order as elements are added and removed (without requiring linear-time shuffling); some want a SortedDict that keeps its keys in sorted order (and doesn't require hashing); SortedSet is less critical, but an obvious extension.

    Every so often, someone proposes adding sorted collections to the stdlib. However, usually, it's in the middle of a thread about something else, like a wrapper around heapq, or making OrderedDict easier to use, or even extending sum (don't ask). There have been two suggestions that would have given us sorted collections, but one was hidden in a proposal to replace the basic list type, and the other was hidden in a proposal to add a general binary tree library; in both cases, the fact that we would get sorted dictionaries for free was a minor afterthought.

    The idea came up for a completely different reason in the discussions leading up to PEP 456: Secure and interchangeable hash algorithm: While hash tables have amortized constant average performance, they have linear worst-case performance, and generating pathological cases that trigger such performance is a way to DoS a web server. A collection with logarithmic average time, but also logarithmic worst-case time, is a safe way to avoid that. Even though sorting is irrelevant to the problem they're trying to solve, the obvious types to use are based on sorting, and the same types you'd use if sorting was the goal.

    Don't we already have heapq and bisect?

    Yes, we do. 

    Of course they have horrible C-style APIs rather than OO APIs, but that's not much of a problem. The bisect docs have a link to a SortedCollection recipe at ActiveState, and a bit of searching turns up multiple ActiveState recipes and PyPI modules wrapping up both modules.

    The problem is that neither of these is a good sorted list type, much less a sorted dictionary. A heap isn't randomly accessibly by either key or index, and a bisect-based list doesn't provide any way to mutate in better than linear time.

    Other languages

    Wikipedia has an article describing the mapping types (both hashed and sorted) in a wide variety of languages. But briefly:

    Most other "scripting" languages (Ruby, Perl, Lua, Common.JS, PHP, etc.) do not have a sorted collection in the stdlib, but most "system" languages (JavaC++, .NET, GHC Haskell, etc.) do.

    Almost every language that has sorted collections has a SortedDict-like type, plus either a SortedList or a SortedSet. Some also have things like SortedMultiDict (which isn't as useful in Python, given how easy it would be to use a SortedDefaultDict(list) instead) and SortedMultiSet (again, a SortedCounter would take care of most uses better), or even multi-indexable sorted collections. A few languages (such as ObjC) provide tools to make a wide variety of uses (including multi-indexed collections, nested index paths, etc.) easier to build, instead of providing the concrete types.

    Almost all of these are implemented with red-black trees, although Java also adds skip-list implementations.

    Performance characteristics

    There are a variety of different implementations that effectively make everything O(log N)—inserting a new element, deleting an element, looking up a key, etc. This is not as good as the amortized O(1) average time for hash tables, although most implementations can also guarantee O(log N) worst-case time, while hash tables are generally O(N). At any rate, searching a sorted collection is much better than the O(N) time for unsorted collections, and inserting or deleting is much better than the O(N) time for sorted dynamic arrays.

    Within that logarithmic time, there are a lot of tradeoffs. B-trees and their relatives have a much larger logarithmic base, but also have better cache locality (for small collections) and paging characteristics (for very large ones). Skip lists are faster at accessing ranges, and easier to implement without locks or with fine-grained locks (hence Java's ConcurrentSkipListMap), while red-black tree algorithms are easy to internally parallelize. AVL trees are very rigidly balanced, which is great when you do a lot more searches than mutations; not so much when you don't. Red-black trees offer very consistent lookup times. Weight-balanced trees and treaps can make more frequently accessed values faster to access. And so on. I don't know of a good survey paper or other link that discusses all of the tradeoffs.

    Key requirements

    The requirements on the keys of a sorted collection are a little more complicated than those of a dict or a set, and they're not automatically checkable. On the other hand, collections.SortedDict won't be one of the half-dozen types that novices are introduced to in Python right off the bat, so it doesn't have to be quire as simple or foolproof.

    Comparability

    With a dict, the main key requirement is simple: every key must be hashable. Every operation automatically tests this requirement, and provides a nice exception if it fails. Of course it's possible to trick the code by, e.g., returning a different hash each time you're asked, but anything non-malicious will either work, or fail obviously.

    With a sorted dict, the main key requirement is not quite as simple: every key must be comparable with every other key, and that comparison must define a strict weak order (or, if the sort isn't stable, a strict total order—but that only matters for a sorted multidict…). And this does not get automatically tested; in general, for any given operation, the supplied key is only tested against log N other keys; if it's comparable to all of them, and the ordering is not obviously inconsistent, there will be no error.

    Note that heapq and bisect already have this problem—in fact, bisect can't even verify that your list is sorted. And plenty of other stdlib modules have requirements that can't be automatically verified.

    One very obvious and important type, float (and complex, and numpy.float*, etc.), does not meet the requirements, because if any of the values are NaN, you don't have an order. However, this is a widespread problem that already affects many core operations in Python (and almost every other language), including list.sort and sorted, so there's no new problem with sorted collections.

    Mutability

    For dict and friends, the keys are required to be immutable, or only mutable in ways that don't affect hashing and equality. And dict is important enough that any type that is hashable, but doesn't meet these requirements, is often considered broken. Of course a dict can't check that its values are hash-immutable and equality-immutable, but in practice, just checking that they're hashable (which it does automatically) is pretty close. 

    For a sorted collection, the keys are required to be immutable, or only mutable in ways that don't affect comparison. But there is no convention that all comparable types should be immutably comparable. This is another way that the requirements for a sorted collection aren't automatically checkable.

    It would be possible to use hash just for verification. Equality-immutability is not sufficient for comparability-immutability, and hashing is not necessary for equality-immutability, but it would catch at least a good number of problem types, and not have that many false positives. (It's rarely that onerous to add a hash method to an immutable class.) However, hashing can be expensive for some types. And is "not that many" good enough? More importantly, this is a very unobvious test. When a dict refuses your type because it's not hashable, that makes sense; if a SortedDict does the same, it will be baffling; what does hashing have to do with sorting?

    Finally, some languages/libraries (e.g., C++ std::map and friends) allow mutable keys that you promise not to mutate while they're in the collection, and at least one person has suggested the same here. But this is a much more complicated requirement to explain than immutability, for very little benefit. And really, the exact same benefit could be added to set and dict, and Python decided against it there. And nobody could think of a really compelling use case. But, since mutability isn't actually tested, you're just violating the documentation if you use "temporarily-immutable" keys, and if you can figure out how to do it safely, well, you're a consenting adult.

    Interface

    SortedDict

    Obviously a SortedDict has to support all of the operations in collections.abc.Mapping. Of course most of those operations will be logarithmic time (even the ItemView, KeyView, and ValueView are logarithmic to index), but that doesn't change the API.

    However, there are additional operations a SortedDict can provide:
    • __getitem__, __setitem__, and __delitem__ can take slices.
      • d[low:high] returns all of the values where low <= key < high, and similar for set and del.
      • Leaving out start or end (or both) works just as well as it does for lists.
      • Negative indices or slice components don't make any sense, as these are keys, not indices.
      • A skip value is feasible, but doesn't seem all that useful (except maybe as a reverse flag).
      • __delitem__, and __setitem__ with a slice that isn't an exact replacement, may not be O(log N) for all possible implementations (in fact, they're worst-case O(log^2 N) for red-black trees).
    • item_slice, key_slice, and value_slice methods.
    • prev, succ, floor, and ceil on a key (or prev_key, prev_item, etc.).
    • min and max (or min_key, min_item, max_key, and max_item)
      • pop_min and pop_max
      • nsmallest and nlargest
      • pop_nsmallest, pop_nlargest
    Lookup by index is possible, but probably not necessary, as d[d.keys()[n]] takes care of it.

    Not all of these are necessarily worth adding, and of course the names are up for bikeshedding.

    Whichever methods are added should be made part of a collections.abc.SortedMapping, which inherits from Mapping. Which methods are required by the ABC and which can be implemented in terms of others is up for investigation. There would of course then be a collections.abc.MutableSortedMapping, and collections.SortedDict would implement that.

    I believe this also implies SortedItemView, SortedKeyView, and SortedValueView ABCs (and un-exported concrete implementations as a detail of SortedDict, just as for the dict equivalents).

    The ABCs don't define construction. However, the obvious constructor for a SortedDict is the constructor for dict, plus a keyword-only "key" argument. (It might be worth having the key accessible after construction as an attribute, too.) The problem is that you can construct a dict as dict(a=1, b=2, c=3), so "key" is ambiguous! I don't have an answer to that one…

    The quasi-documented __missing__ method could be different from dict.__missing__, because it can be faster if it remembers the insert position. However, this may not be necessary (it seems like it should be only a 33% speedup in theory, and even less in practice, because the rebalance will probably be slower than the search on average).

    Note that this interface is very close to the one defined in bintrees 2.0, but not identical. (In particular, bintrees cannot take a key parameter, which means you have to decorate-sort-undecorate to use it.)

    SortedSet

    A SortedSet can do everything a Set can, so again, we have an ABC subclass SortedSet, and MutableSortedSet, and collections.SortedSet implements collections.abc.MutableSortedSet. (There doesn't seem to be a compelling use case for a collections.FrozenSortedSet).

    The added methods are:
    • slice (returns all members in the specified range)
    • prev, succ, floor, ceil
    • min, max, pop_min, pop_max, nsmallest, nlargest, pop_nsmallest, pop_nlargest
    Again, the constructor needs to take an optional key.

    SortedList

    A SortedSequence can do everything a Sequence can, but a MutableSortedSequence cannot do __setitem__, append, extend, or insert. (It can still do __delitem__.) Also, note that indexing is logarithmic instead of constant.

    Meanwhile, SortedSequence adds a few methods:
    • indices (like item_slice and slice)
    • prev, succ, floor, ceil
    • min, max, pop_min, pop_max, nsmallest, nlargest, pop_nsmallest, pop_nlargest
    • bisect_left, bisect, bisect_right?
    • insort_left, insort, insort_right? (for manually maintaining stable order)
    And MutableSortedSequence adds:
    • add
    • update
    And MutableList implements MutableSortedSequence, and has a constructor like list's but with a key parameter.

    Note that this interface is in some ways more like MutableSet than like MutableSequence. That seems unavoidable. However, in other ways—most notably, indexing—it's very un-set-like. It's possible that the two could be merged into a single abc, and that the concrete classes could be merged as well, but that might lead to more confusion than benefit.

    Alternate implementations

    It should be relatively easy to adapt a third-party type like blist.sorteddict to fit collections.abc.SortedMutableMapping. And of course the ABC should do as much work as possible, as it does for the existing ABCs. As mentioned earlier, the exact set of methods that are required vs. default-implemented needs to be explored.

    Adapting blist is particularly important, because it's probably the most popular sorted collection library available, and it's also an oddball in many ways, some of which would make it desirable even if there were an alternative in the stdlib.

    Subclasses

    Subclassing these types should be as easy as subclassing the non-sorted equivalents, but note the open issue with SortedDict.__missing__.

    SortedDefaultDict could definitely be useful. Whether it needs to be included in the stdlib, listed as a recipe or a link to a recipe on ActiveState, or left as an exercise for the user probably depends on how easy it is to implement (which may depend on whether __missing__ is optimized).

    SortedCounter seems like it will be useful less often, but still has some use. Again, how it should be included/referenced probably depends on how easy it is to implement.

    The blist library comes with weakref-based collections that automatically remove members when no longer alive. I don't think this needs to be part of the collections library, but again, it might be worth a link to a recipe.

    Implementation

    As mentioned earlier, nearly every language that has sorted collections uses red-black trees. Unless someone makes a solid case for a different type, it seems like the default assumption should be that Python should also use red-black trees.

    However, that doesn't mean red-black trees are definitely the answer. For example, if blist turns out to be the easiest module to bring into the stdlib and maintain there, I don't think it would be worth rejecting because it uses a different implementation. Or, if skip lists turn out to be useful for some other purpose (e.g., a coroutine-friendly lock-free priority queue for tulip's scheduler), it might be worth reusing them instead of adding a red-black tree implementation alongside it.

    Optimizations

    One thing not often considered in sort trees is the performance of building a tree all at once from an existing collection (or iterator). Obviously, SortedList(foo) is always going to be slower than list(foo) because of the cost of building tree nodes, etc. However, ideally, it should be slower than timsort only by a constant factor—in particular, O(N) or near it instead of O(N log N) in the key cases. At least, creating a SortedFoo from an already-sorted collection should be fast enough that the optimization of "just copy the tree (possibly transforming each node)" to create a SortedFoo from a SortedBar isn't necessary.

    There are some other optimizations worth considering. For example, with a B-tree, calling update with a bunch of values within a small range will go much faster if you either pre-sort the values, or just partition the values into subtrees and then sort only at the bottom, than if you insert one by one. Which means you might even want to batch and lazily insert adds. That one doesn't make any sense for balanced trees or treaps, but there are other optimizations that might.

    The bintrees library provides a more complex view type, TreeSlice; blist, on the other hand, uses copy-on-write subtrees. If these optimizations aren't necessary for list and other types, are they actually worth adding for sorted collections?

    There are some obvious optimizations that can be done in various special cases. For example, if all of the keys are smallish integers, you can use a bucket- or radix-based structure instead of a general tree structure, for constant operations (well, really something like O(log K) where K is the largest key) instead of O(log N). Given that we don't do such optimizations for sorted and sort, I don't think we need to do sorted collections.

    Banyan has some interesting optimizations based on tree augmentation, as well as some obvious ones for things like homogenous collections.

    Pre-existing implementations

    The bintrees library is already 90% of the way to implementing this design. It has a solid pure-Python implementation that's known to work in (and even somewhat optimized for) PyPy 2.0, and a C-accelerated implementation that's reasonably fast in CPython 3.3 (and 2.7), although it uses Cython rather than native C. It also includes two other trees (an AVL tree and an unbalanced tree), but those can be stripped out easily. It's actively maintained. Unfortunately, the author does not want to maintain it as a stdlib module.

    The Banyan library is also pretty close, and seems to be actively maintained, although I don't know if it's as mature. However, there is no pure-Python implementation, and the native implementation is C++ rather than C (and inherently so, as it relies on template metaprogramming).

    There's also blist. This uses a custom B+tree, with some unique optimizations. It would also need more changes than bintrees, and a pure-Python implementation (currently, there's an underlying C library and a bunch of Python modules built on top of it). On the other hand, it's probably in more widespread use by applications than any other sorted collection library. And, given that the author proposed blist for the stdlib once, there's a decent chance he'd be willing to again.

    There are a number of other red-black tree implementations on PyPI to look at; I don't know of any that are maintained as well as bintrees, but it's worth doing a survey.

    Finally, building a red-black tree from scratch isn't all _that_ hard.

    Staged introduction

    We could just add the ABCs first, wait for library authors to adapt their code to fit, and then pick one to migrate to the stdlib.

    We could even provide a "registry", so the user can just ask for "a MutableSortedMapping" and, depending on what's installed on his system, end up with blist or bintrees or rbtree or whatever, as with BeautifulSoup picking a parser. (This would probably still require adding a pure-Python implementation with the right algorithmic complexity as a fallback, but it wouldn't need C acceleration, optimizations for already-sorted data, etc.)

    However, similar ideas have been suggested for other proposals over the last few months, and roundly shouted down, so I don't think this is seriously tenable. We need a production-quality implementation for the stdlib or nothing's getting into the stdlib.

    Other open issues

    Attractive nuisance

    There are problems where a sorted collection seems like the obvious solution (especially to a novice), but really isn't.

    For example, getting the M largest lines from an N-line file can be done in O(N log N) time with a SortedList, but it can be done in O(N log M) time, and with a smaller constant factor to boot, using heapq.

    In fact, if you aren't going to mutate the collection after sorting, sorted is faster by a significant constant factor than SortedList—and the fact that a regular sorted list it's more painful to use (via bisect) makes the attraction even stronger.

    And if you just want a SortedDict because you're mechanically porting code from Java or C++ and want the equivalent type in Python… you shouldn't be doing that. If your algorithm depends on having a logarithmic sorted mapping, the Python implementation will too, but if your Java or C++ implementation is just using SortedMap or std::map because it was convenient, the Python implementation should not.

    One true sorted collection?

    Today, we have a choice of various sorted collections; a few seconds on PyPI turned up dozens, plus libraries that are relatively simple to implement a sorted collection on top of. We can choose between red-black trees, skiplists, or B-trees. We can even choose data structures that are slower in general, but faster in some use cases (e.g., OrderedDict plus calling sort() for any search after an insert, and using bisect() for searching, is faster if you do only a few large batches of inserts). If we want red-black trees, we can choose between bintrees, rbtree, red-black-tree, Banyan, or half a dozen others. So, maybe attempting to define the One True Sorted Collection is a bad idea?

    I don't think this is really a problem. The stdlib is full of libraries that aren't the only way to do something, just a way that's usually good and rarely surprising. For example, the existence of MySQL, BeautifulSoup, and lxml doesn't mean sqlite3, html.parser, and ElementTree are bad ideas. In fact, providing a good-enough SortedDict, along with a SortedMapping ABC that alternatives can conform to, should make it easier for users to experiment with different types to optimize their code, not harder.

    4

    View comments


  3. You're writing a GUI app using Tkinter or PySide or your favorite GUI library, and testing it in-place, everything works.

    Then you build a .app bundle out of it, double-click it in Finder, and it can't read your environment variables. (In fact, the "open" command, AppleScript, LaunchAgents… anything but directly running the program in the shell has the same problem.)

    You check to make sure you've added them in your ~/.bash_profile. You open a new shell in Terminal, and there they are. So why can't your app see them?

    Quick fix

    If you're familiar with linux or other Unix systems, and are just looking for a fix, and don't care about how and why it works, do this:
    • Instead of typing "export FOO=bar", do "launchctl setenv FOO bar".
    • Instead of adding "export FOO=bar" to ~/.bash_profile, do "/usr/libexec/PlistBuddy -c 'add FOO string bar' -x ~/.MacOSX/environment.plist".
    • Instead of adding "FOO=bar; export FOO" to /etc/profile, add "setenv FOO bar" to /etc/launchd.conf.

    What's going on?

    On most other Unix desktops, like a typical linux GNOME setup, your file manager is a child of your X11 session, which is a child of your login shell, so it's inherited all the settings from that login shell. 

    When you double-click an app, it either uses the shell to launch the app, or spawns it directly as a child process. So, your app is going to inherit the environment of either your original login shell, or a new shell; either way, it's going to get all the settings in your ~/.bash_profile, /etc/profile, etc. (assuming your shell is bash).

    On OS X, Finder.app is a child of your launchd session, which is a child of the root launchd session, which is process 0. There's no shell anywhere. And when you double-click an app, it asks LaunchServices to run the app. So, your app doesn't inherit anything from Finder, and even if it did, Finder hasn't inherited anything from your login shell.

    So, instead of configuring your login shell, you need to configure launchd.

    The way to configure launchd is with the launchctl command.

    As the manpage explains, launchctl talks to the current launchd process, so any changes it makes aren't going to be persistent to new sessions, but "These commands can be stored in $HOME/.launchd.conf or /etc/launchd.conf to be read at the time launchd starts."

    Except, as the launchd.conf manpage mentions, $HOME/.launchd.conf is "currently unsupported". So, how do you add user environment variables?

    There's an older mechanism for specifying environment variables for a user session in an environment.plist file, and for backward compatibility reasons, launchd reads that file at startup. So, until user launchd.conf files are supported, you have to use that mechanism instead. You probably don't have a .MacOSX directory, so you'll need to create that, and create the plist file. And, if you do have them, they may be in binary plist form rather than XML. So you probably want to use PlistBuddy.

    How do I do app-specific environment variables?

    On most Unix systems, if you want a program to run with app-specific environment variables, you rename the program to "_foo" (or, more likely, hide it somewhere like /usr/lib), then create a wrapper shell script "foo" that just does a few exports and runs the real program.

    This won't work on OS X. If you want to actually launch an app as an app (so it does the right thing with the Dock, menubar, etc.), you have to launch if via the "open" tool, or some other LaunchServices-based tool, which means it won't inherit your shell environment.

    Fortunately, you don't have to do this on OS X in the first place. The Info.plist file inside every .app bundle has a key named LSEnvironment. When LaunchServices launches the app, it first sets all of the environment variables from the dict under that key.

    What about plain Unix executables?

    Plain Unix executables, like "ls" and "grep", aren't normally launched by LaunchServices; you just run them from the shell, so of course they inherit your shell's environment.

    And if you do something like double-click /bin/ls in Finder, LaunchServices doesn't launch ls, it launches Terminal.app (by default; you can actually change it to iTerm.app or anything else you want, the same way you can change what app handles text files or HTML files), which will open a new window and fire up a new shell running "/bin/ls". So, it will inherit your profile environment.

    Why so crazy?

    Given the distinction between app bundles and executable files, this design makes perfect sense. Of course not every Unix fan likes the idea of app bundles, or bundles in general. But once you accept that idea, you can't run a bundle from the shell, so why should LaunchServices fake a shell when running a bundle? If you want something you can run from a shell, write a bare executable rather than an app bundle. Or just run the bare executable out of your app, e.g., "/Applications/Foo.app/Contents/MacOS/foo" instead of "open /Applications/Foo.app".

    Of course there are limitations to what bare executables can do with the GUI and other parts of the Cocoa environment. But those limitations are directly caused by the fact that they're not launched by LaunchServices and don't have a bundle.

    Anyway, this design allows Apple to unify almost everything into launchd: init scripts, rc.d scripts, cronjobs, inetd, the NeXT-style SystemStarter, watchdogs, loginwindow, all the stuff buried in LaunchServices that you couldn't access directly… it's all done by one pretty simple program.

    Historical note

    Before launchd, OS X used a variety of different methods to launch apps, inherited from NeXTStep, Classic Mac OS, and *BSD. Most ways you'd launch a program inherited their settings from the loginwindow process. When loginwindow logged in a user, it read environment settings out of a file called ~/.MacOSX/environment.plist. Earlier versions of launchd would also read that file for backward compatibility reasons, but that's no longer true in 10.8 and later.
    0

    Add a comment

  4. There's a pair of intertwined threads in python-ideas about coming up with a way to break out of list comprehensions and generator expressions early.

    Oscar Benjamin gives a good example:

        isprime = all(n % p for p in takewhile(lambda p: p**2 < n, primes_seen))
    

    That's ugly. What you really want is to be able to write something like:

        isprime = all(n % p for p in primes_seen while p ** 2 < n)
    

    This is exactly the same as replacing filter calls with if clauses, so it ought to be solvable, and worth solving, right?

    But there are serious problems with a while clause that an if clause doesn't have. And, as the thread demonstrates, nobody's been able to come up with a better syntax that doesn't have at least as many problems. Also, as Nick Coghlan pointed out earlier in the thread, there's probably a reason that filter is a builtin an takewhile is not. So, the cost is higher, and, while the benefit is equivalent, it's for a much less common case.

    I think it's worth backing up to see why we have this problem.

    Ultimately, the problem—if there is one—is that Python got comprehensions "wrong". But I'm not sure it's actually wrong at all.

    Incomprehensibility

    Python borrowed comprehensions directly from Haskell. A comprehension is a sequence of for and if clauses, which are interpreted as the equivalent nested statements. In other words, this:

        (line for file in files for line in file if not line.startswith('#'))

    … means:

        def lines(files):
            for file in files:
                for line in file:
                    if not line.startswith('#'):
                        yield line
    

    And this is exactly why we can't add a while clause: you can't map it to a nested while statement. For example, imagine you want to parse the RFC822 headers from a list of files:

        (line for file in files for line in file while line.strip())
    
        def lines(files):
            for file in files:
                for line in file:
                    while line.strip():
                        yield line
    

    While that's legal code, it doesn't mean what you want—it means an infinite sequence of the first line in the first file.

    And the problem is the same for every other syntax everyone's come up with. In particular, anything based on break isn't going to work because break doesn't nest.

    You could invent an until statement that can be nested:

        (line for file in files for line in file until not line.strip())
    
        def lines(files):
            for file in files:
                for line in file:
                    until not line.strip():
                        yield line
    

    … but that's a statement nobody's ever going to use.

    You can read the thread for all of the different alternatives proposed, but they all have the same problem.

    But really, you don't care about nesting if statements. After all, there's no reason to ever write the first statement below given that it's always equivalent to, and less readable than, the second:

        if foo:
            if bar:
    
        if foo and bar:

    And the same is going to be true for while/break/until/whatever. In fact, you explicitly don't want it nested.

    And that leads to a different way to design comprehensions: A sequence of nested for clauses, each of which can have exactly one optional if condition attached—defined as, in effect, applying a filter. To make things clearer, we might even want to add punctuation. After all, when you translate a non-trivial comprehension into English, you definitely use punctuation. For example:

        (line for file in files, for line in file if not line.startswith('#'))

    That design makes it easy to attach a new kind of condition onto each nested for loop: After the optional if condition, there's an optional while condition, defined as applying a takewhile. So:

        (line for file in files, for line in file while line.strip())

    Most Lisp-family languages that have borrowed comprehensions do something similar to this (e.g., Clojure, Racket), and most of them have while conditions. But I can't think of a single language that does Haskell-style comprehensions that has a while condition. And I don't think that's an accident.

    Lambda the Ultimate Pain in the Neck

    But if we borrowed comprehensions straight out of Haskell, how could we have gotten them wrong? Haskell doesn't have this problem, does it?

    Well, in Haskell, the right answer is to use takewhile—but the reason that works is that it's not nearly as painful as in Python.

    The problem with takewhile is that you need to turn your expression into a function. For a trivial expression, wrapping it in a lambda (or an out-of-line def) means there's more boilerplate noise than actual functionality.

    Going back to Oscar Benjamin's example:

        lambda p: p**2 < n

    Besides being almost twice as long, it also requires you to create a new p argument just to pass the existing p value to—and it's confusing if you name them both p, and even more confusing if you don't.

    But Haskell's lambdas are no better than Python's:

        \p -> p**2 < n

    Sure, they're a bit briefer, but that doesn't make them more readable.

    The reason this works in Haskell is that you usually don't need a lambda at all. For example, these are equivalent functions:

        \p -> 2**p
        (**) 2
    

    And so are these:

        \i -> n < i
        (<) n
    

    The Python equivalents are:

        partial(pow, 2)
        partial(lt, n)
    

    Partly this is because Haskell lets you access any infix operator as a prefix function just by wrapping it in parens—but the real key is that functions are always curried, which means you get automatic partials. It's not the "pow" and "lt" that are a problem there, it's the "partial".

    And combining those two Haskell features means that these are also equivalent:

        \p -> p**2
        (** 2)
    
        \i -> i < n
        (< n)
    

    There's no way to do that in Python without a function to flip the args around:

        def flip(f):
            def flipped(x, y):
                return f(y, x)
            return flipped
    
        partial(flip(pow), 2)
        partial(flip(lt), n)
    

    Meanwhile, Haskell makes it trivial to compose functions. There are even better alternatives, but here's the basic one that you know by the end of the first chapter of any tutorial:

        (< n) . (** 2)

    What would that look like in Python?

        def flip(f):
            def flipped(x, y):
                return f(y, x)
            return flipped
    
        def compose(f, g):
            def composed(x):
                return f(g(x))
            return composed
    
        compose(partial(flip(lt), n), partial(flip(pow), 2))
    

    Compare that to the lambda we were hoping to eliminate:

        lambda p: p**2 < n
    

    It's pretty obvious that this is not an improvement!

    Programming in Haskell is all about building functions out of functions. Programming in Python is not. This certainly doesn't mean Haskell is a better language, it just means that Haskell-style comprehensions are more useful in Haskell than in Python.

    Is there another way around the problem?

    C++

    C++03 had an even worse problem with lambda than Python: it didn't have lambda at all. This meant that many of the powerful tools in the <algorithms> library—equivalents to map, filter, etc.—were rarely used, and especially not in simple cases.

    For example, if you want to turn an explicit loop like this:

        template <typename In, typename Out>
        void double(In first, In last, Out out) {
            for (In it=first; it !=last; ++it)
                *out++ = *it + *it;
        }
    

    … into a transform call, it looks something like this:

        template <typename T>
        struct doubler {
            T operator()(T i) const { return i + i; }
        };
    
        template <typename In, typename Out%gt;
        void double(In first, In last, Out out) {
            transform(first, last, out, doubler());
        }
    

    Nobody's going to use that unless the logic is pretty complex. Which means many C++ programmers never even learn it, so they don't use it even when the logic is complex enough to be worth it.

    In C++11, you can write it with a lambda:

        template <typename In, typename Out>
        void double(In first, In last, Out out) {
            transform(first, last, out, [](auto i) { return i+i; });
        }
    

    … which gets you to about the same place as Python.

    But before C++11, there were a number of attempts to build lambda purely out of libraries. Most of them were expression tree libraries, like Boost.Lambda, which let you write things like this:

        template <typename In, typename Out>
        void double(In first, In last, Out out) {
            transform(first, last, out, _1 + _1);
        }
    

    Now that's more like it. It's not beautiful—it requires magic variables, and it's not at all clear that "_1 + _1" is actually a function—but who ever expects C++ to be beautiful? More seriously, it ends up being much harder for the compiler to deal with, which means at best your compilation takes 5x longer; at worst your actual code also runs slower (because things didn't get inlined and optimized ideally), and people do expect C++ to be fast…

    Anyway, if you had this in Python, you could just write this:

        out = map(_1 + _1, in)
    

    And can we build the same thing in Python? Sure. An expression tree is just a callable object that overrides all of the usual operators to return new callable objects.

    Here's a quick hack:
        class Placeholder(object):
            def __init__(self, func=lambda arg: arg):
                self.func = func
            def __call__(self, arg):
                return self.func(arg)
            def __add__(self, addend):
                return Placeholder(lambda arg: self.func(arg) + addend)
            def __radd__(self, addend):
                return Placeholder(lambda arg: addend + self.func(arg))
            # etc.
    
        _1 = Placeholder()
    
        numbers = map(_1 + 1, range(5))
        print(list(numbers))
    

    Again, this is a quick hack—it won't work with _1 + _1, or a _2, and it's missing a way to handle method calls (which you can take care of with __getattr__) and normal functions (which you need some kind of wrapper for)… but this is enough to show the idea. In fact, it's enough to solve our initial problem:

        isprime = all(n % p for p in takewhile(_1 ** 2 < n, primes_seen))

    In fact, with some nasty _getframe hackery, you might be able to make this even nicer. (Note that sys._getframe().f_code.co_varnames[1] is 'p'…)

    But is this really the answer we want?

    Summary

    It's obviously far too late to change Python from Haskell-style comprehensions to Clojure-style comprehensions.

    There's no way Python will have all the Haskell features that make takewhile good enough as-is, because they would be bad features for a language like Python.

    And expression trees don't seem like the right answer.

    So, what is the right answer?

    Sometimes, the right answer is to just define the function out-of-line:

        def is_candidate(p):
            return p**2 < n
        isprime = all(n % p for p in takewhile(is_candidate, primes_seen))
    

    And PEP 403 might help here:

        @in isprime = all(n % p for p in takewhile(is_candidate, primes_seen))
        def is_candidate(p):
            return p**2 < n
    

    But more often, I think, you just want to split the expression up:

        candidate_primes = takewhile(lambda p: p**2 < n, primes_seen)
        isprime = all(n % p for p in candidate_primes)
    

    Either way, can you really argue that this is less readable than the alternatives available in other languages?

    Many experienced programmers—especially those who came from C-family languages or Python 2.x—instinctively rebel against this, because they think there's a cost to storing that intermediate value. But there isn't. That's the beauty of iterators and generators. You're doing the exact same sequence of operations no matter how many times you split it up. And this means that you can mix and match the tricks of imperative-style programming and all the tricks of functional-declarative-style programming and use whatever reads the best.

    If you're feeling jealous of other languages, consider that Haskell requires you to write imperative-style code inside a monad with completely different syntax, while C++ requires you to write functional-style code in a completely different language that runs at compile time, only operates on integers and types, and has possibly the worst syntax anyone has ever come up with for a Turing-complete language without intentionally trying.
    4

    View comments

About Me
About Me
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.