1. There are a lot of questions on StackOverflow asking "what's the deal with self?"

    Many of them are asking a language-design question: Why does Python require explicit self when other languages like C++ and friends (including Java), JavaScript, etc. do not? Guido has answered that question many times, most recently in Why explicit self has to stay.

    But some people are asking a more practical question: Coming from a different language (usually Java), they don't know how to use self properly. Unfortunately, most of those questions end up getting interpreted as the language-design question because the poster is either a StackOverflow novice who never figures out how to answer comments on his question (or just never comes back to look for them) or a programming novice who never figures out how to ask his question.

    So I'll answer it here.

    tl;dr

    The short version is very simple, so let's start with that:

    • When defining a method, always include an extra first parameter, "self".
      • Yes, that's different from Java, C++, JavaScript, etc., where no such parameter is declared.
    • Inside a method definition, always access attributes (that is, members of the instance or its class) with dot syntax, as in "self.spam".
      • Yes, that's different from C++ and friends, where sometimes "spam" means "this->spam" (or "this.spam", in some languages) as long as it's not ambiguous. In Python, it never means "self.spam".
    • When calling a method, on an object as in "breakfast.eat(9)", the object is passed as the first ("self") argument.
      • Yes, that's different from C++ and friends, where instead of being passed normally as a first argument it's hidden under the covers and accessible through a magic "this" keyword.

    Exceptions to the rules

    Most of these should never affect novices, but it's worth putting them all in one place, from most common to least:
    • When calling a method on the class itself, instead of on one of its instances, as in "Meal.eat(breakfast, 9)", you have to explicitly pass an instance as the first ("self") argument ("breakfast" in the example).
    • A bound method, like "breakfast.eat", can be stored, passed around, and called just like a function. Whenever it's eventually called, "breakfast" will still be passed as the first "self" argument.
    • An unbound method, like "Meal.eat", can also be stored, passed around, and called just like a function. In fact, in 3.0+, it is just a plain old function. Whenever it's eventually called, you still need to pass an instance explicitly as the first "self" argument.
    • @classmethods take a "cls" parameter instead. Whether you call these on the class itself, or on an instance of the class, the class itself gets passed as the first ("cls") argument. These are often used for "alternate constructors", like datetime.now().
    • @staticmethods do not take any extra parameter. Whether you call these on the class itself, or on an instance, nothing gets passed as an extra argument. These are almost never used.
    • __new__ is always a @staticmethod, even though you don't declare it that way, and even though it actually acts more like a @classmethod.
    • If you need to create a bound method explicitly for some reason (e.g., to monkeypatch an object without monkeypatching its type), you need to construct one manually using types.MethodType(func, obj, type(obj)).

    What does a name name?

    In a language like C++ or Java, when you type a variable name all by itself, it can mean one of many different things. At compile time, the compiler checks for a variety of different kinds of things that could be declared, in order, and picks the first matching declaration it finds. The rules are something like this (although not exactly the same in every related language): 
    1. If you declared a variable with that name in the function definition, it's a local variable. (Actually, it's a bit more complicated, because every block in a C-family language is its own scope, but let's ignore that.)
    2. If you declared a static variable with that name in the function definition, it's a static variable. (These are globals in disguise, except that they don't conflict with other globals of the same name defined elsewhere.)
    3. If you declared a parameter with that name in the function definition, it's a function parameter. (These are basically the same as local variables.)
    4. If you declared a variable with that name in the function that the current local function is defined inside, it's a closure ("non-local") variable.
    5. If you declared a member with that name in the class definition, it's an instance variable. (Each instance has its own copy of this variable.)
    6. If you declared a class member with that name in the class definition, it's a class variable. (All instances of the class share a single copy of this variable, but each subclass has a different single copy for all of its instances.)
    7. If you declared a static member with that name in the class definition, it's a static class variable. (All instances of all subclasses share a single copy of this variable—it's basically a global variable in disguise.)
    8. Otherwise, it's a global variable.
    If you use dot-syntax with "this", like "this.spam" (in C++, "this->spam"), or with the class, like "Meat.spam" (in C++, "Meat::spam"), you can avoid those rules and unambiguously specify the thing you want—even if there's a local variable named "spam", "this.spam" is still the instance variable.

    Python doesn't have declarations. This means you don't have to write "var spam" all over the place as you do in JavaScript (or "const char * (Eggs::*Foo)(const char *)" as in C++) just to create local variables. Even better, you don't need to declare your class's instance variables anywhere; just create them in __init__, or any other time you want to, and they're members.

    That gets rid of a whole lot of useless boilerplate in the code that doesn't provide much benefit. But it does mean you lose what little benefit that boilerplate would have provided—in particular, the compiler can't tell whether you wanted a global variable or an instance variable named "spam", because there's nowhere to check whether such an instance variable exists.

    Therefore, when you're used to having a choice between "spam" and "this.spam", in Python you always have to write "self.spam". (And when you have a choice between "spam" and "Meat.spam", possibly with "this.spam" as an option, in Python you always have to write "Meat.spam", possibly with "this.spam" as an option.) 

    If you want to know why, you really need to read the article linked above. But briefly, besides the fact that the advantages of eliminating declarations are much bigger than the costs, "Explicit is better than implicit."

    So, what rules does Python use for deciding what kind of variable a name names?
    1. If you list the variable name in a global statement, it's a global variable.
    2. If you list the variable name in a nonlocal statement (3.0+ only), it's a closure variable.
    3. If you assign to the variable name somewhere in the current function, it's local.
    4. If the variable name is included in the parameter list for the function definition, it's a parameter (meaning it's local).
    5. If the same name is a local or closure variable in the function the current function was defined inside, it's a closure variable.
    6. Otherwise, it's global.
    Notice that, while a few of these (1, 2, and 4) are kind of similar to "declarations", the others (3, 5, and 6) are not at all the same. The fact that you don't accidentally get global variables when you forget to declare things (as in JavaScript) is another advantage of Python's way of doing things.
    1

    View comments

  2. Tkinter makes slapping together a simple GUI very easy. But unfortunately, many of its features aren't very well documented. The only way to figure out how to do something is often to figure out what Tk objects it's using, look up the Tcl/Tk documentation for those objects, then try to work out how to access them through the Tkinter layer.

    One of the places this most often comes up is validating Entry boxes. Novices expect that Entry must have some kind of <Change> event, and if they could just find the right name and bind it, they could handle validation there.

    The first obvious thought is to bind <KeyDown> or <KeyUp>. But that doesn't work, because some key presses don't change anything (like Tab), and there are ways to change the contents without typing anything (like pasting or dragging). If you work hard enough, you can find all the right events to bind and filter out the right cases and get the equivalent of a <Change> event…

    But after doing so, you can't do anything useful in the event handler! All of these events get fired before the contents of the Entry have been changed. So all you can validate is whatever used to be there, which isn't very helpful.

    There is a way around this, but it's very clunky: your event handler, can set up the real handler with to run the next time through the event loop, via after_idle. In that real handler when you access the contents of the Entry, you're getting the new contents.

    Surely there must be a better way.

    And there is. In fact, two different ways. And I'll explain both, with links to the Tcl/Tk docs. Hopefully, after reading this, you'll not only know how to validate Entry boxes, but also how to figure out how to do things in Tkinter that aren't explained anywhere.

    But first, make sure you've read the first few sections of the Tkinter docs, at least up to the section called Mapping Basic Tk into Tkinter, but ideally the whole chapter.

    Example Program

    Let's create a dead-simple stupid program (using Python 3; just change the "tkinter" to "Tkinter" and it will work in Python 2):
        from tkinter import *
    
        class MyFrame(Frame):
            def __init__(self, parent):
                Frame.__init__(self, parent)
                self.text = Label(self, text='Name')
                self.text.pack()
                self.name = Entry(self)
                self.name.pack()
                self.name.focus_set()
                self.submit = Button(self, text='Submit', width=10, 
                                     command=self.callback)
                self.submit.pack()
                self.entered = Label(self, text='You entered: ')
                self.entered.pack()
    
            def callback(self):
                self.entered.config(text='You entered: ' + self.name.get())
                self.name.delete(0, END)
    
        root = Tk()
        frame = MyFrame(root)
        frame.pack()
        root.mainloop()
    
    Now, we want to validate this in the simplest way possible: when the Entry is empty, the Button should be disabled.

    Validation

    Tk Entry boxes have a validation feature. Unfortunately, the Tkinter docs only mention this off-hand in one place, and give no more information than that they "support validate and validatecommand". Unless you were already a Tcl/Tk expert, you'd have know idea what this means. But at least you can Google for "Tk entry validatecommand", which should get you to the docs here.

    Reading those docs, we want our command to get called whenever the Entry is edited, which means we want to set the "validate" value to "key". That's easy.

    We also want our command to get called with the updated value of the Entry, so that we can tell whether it's empty or not. For this, we want to use the substitution "%P". But how do we do that?

    This is the tricky part that you won't find anywhere in the docs. The Tcl/Tk docs say to do it "just as you would in a bind script", but in a Python/Tkinter event binding you get pass a callable, and it gets called with some arguments that are specified by Tkinter. That doesn't work here.

    Instead, you have to manually do what bind does for you under the covers: What you actually pass is a tuple of function ID and one or more argument strings. To get that function ID, you tell Tkinter to register your callable and return an ID. Then, when Tkinter tries to call the function by ID, Tkinter looks up your registered callable and calls it, with arguments matching your string argument spec.

    Your validate method can do whatever it wants, but at the end it has to return True to allow the change, False to reject it, or None to disable itself (so the Entry is no longer validated). If you return False to reject the change, the Entry contents will not be changed (just as if the user hadn't typed/pasted/whatever anything). And, if you've set an invalidcommand, it will get called. Just like the validatecommand, the invalidcommand has to be a function ID and argument strings.

    Sound confusing? Yeah, it is, especially since it's not documented anywhere. But it's not that hard once you get the hang of it.

    First, we create a validate method. It's going to take the "%P" argument, so let's call the parameter "P":
        def validate(self, P):
            self.submit.config(state=(NORMAL if P else DISABLED))
            return True
    
    Now, in our constructor, we have to register that method, and just pass that ID along with the "%P" string as the validatecommand (and "key" as the validate):
        def __init__(self, parent):
            # ...
            vcmd = parent.register(self.validate)
            self.name = Entry(self, validate='key', validatecommand=(vcmd, '%P'))
            self.name.pack()
            # …
    
    One last thing: because the validate method doesn't get called until the Entry changes, you'll want to either start the Button off disabled, or manually call the validate method at the end of the constructor (making sure to pass the appropriate value for the P parameter, of course).

    You can find a complete version of the code at Pastebin.

    Actually, one more one last thing—which doesn't come up very often, but will confuse the hell out of you if it does. If your validatecommand (or invalidcommand) modifies the Entry directly or indirectly (e.g., by calling set on its StringVar), the validation will get disabled as soon as your function returns. (This is how Tk prevents an infinite loop of validate triggering another validate.) You have to turn it back on (by calling config). But you can't do that from inside the function, because it gets disabled after your function returns. So you need to do something like this:
        def validate(self, P):
            if P == 'hi':
                self.name.delete(0, END)
                self.name.insert(0, 'hello')
                self.after_idle(lambda: self.name.config(validate='key'))
                return None
            else:
                return True
    

    Variable tracing

    There's another way to do this, which existed before Tcl/Tk had validation commands.

    Tk lets you attach a variable to an Entry widget; the variable will hold the current contents of the Entry at any time. Of course it has to be a Tcl variable, not a Python variable, but Python/Tkinter lets you create a Tcl variable with StringVar and related classes. (Also see Coupling Widget Variables in the Python docs.)

    So far, that doesn't sound useful. But Tcl has another feature called variable tracing. You can attach an observer callback that gets called whenever a variable is read (accessed), or written (assigned a new value), or unset (deleted). The existence of the trace function is documented in Tkinter, but that's as far as it goes; there's just a big "FIXME: describe the mode argument and how the callback should look, and when it is called." However, another page, called A Validating Entry Widget, serves as an example. It still doesn't document the API, but it happens to show exactly what we want to do with tracing.

    To find out how trace actually works rather than blindly copy-pasting magic code, you have to turn to the Tcl/Tk docs again. And of course that still doesn't tell you how Tkinter maps between Python and Tcl. So, here's the deal:

    To set a trace on a StringVar or other variable, you call its trace method with two arguments: a mode and a callback, just as the docs say. The "r" and "u" modes aren't very useful, but the "w" mode is called whenever the variable is written--which happens every time the Entry you've attached it to changes contents. When your callback is called, it gets three arguments: name1, name2, and mode.

    Together, name1 and name2 provide the Tcl name of the variable. For an array or other collection, name1 is the array variable and name2 is the index into the array (as a string, like "2"—everything in Tcl is a string). For a scalar, like a string or integer, name1 is the scalar variable and name2 is an empty string. Since Tkinter doesn't make it easy to create and wrap Tcl arrays, name2 will always be empty. But what's name1? Tkinter creates Tcl variables dynamically, giving them names like PY_VAR0. You can find the name of the Tcl variable underlying any StringVar as its _name attribute. So, if you have 10 identical Entry boxes, and you want to do the same code when any of them change, but be able to tell which one it is, you can use name1 for that. That being said, it's a lot easier to just create 10 separate closures around the same function in Python and not bother with the _name nonsense. So, you'll rarely use these arguments either.

    And that's why the same just declares the callback with *dummy for the parameters.

    Unlike a validation function, the trace function can't interfere with what's happening. In fact, by the time your function gets called, the Entry has already been modified and the variable has already been updated. If you want to reject (or modify) the change, you have to do that manually, rather than just returning False.

    As before, first we'll write our validate method:
        def validate(self, name, index, mode): # or just self, *dummy
            self.submit.config(state=(NORMAL if self.namevar.get() else DISABLED))
    

    And now, we'll hook it up:
        def __init__(self, parent):
            # ...
            self.namevar = StringVar()
            self.namevar.trace('w', self.validate)
            self.name = Entry(self, textvariable=self.namevar)
            self.name.pack()
            # …
    
    And that's all there is to it. Again, complete code is at Pastebin.

    You may be wondering about performance. Isn't attaching debugging hooks ridiculously slow? Maybe, but who cares if you waste hundreds of microseconds on every user input, when user inputs take at least 1000x as long as that? If you're really worried about that kind of thing, you shouldn't be using Tkinter—everything it does is Tcl under the hood, and everything Tcl does is building and evaluating string commands in the most wasteful way you can possibly imagine (and that's before you even add all the Tkinter Tcl<->Python bridging on top of it). If your GUI is responsive enough (which it almost certainly is), adding variable tracing will not change that.

    Deciding which one to use

    In Python, there should be only one obvious way to do it. But here, there are two ways to do it. Which one is the obvious one?

    The key thing to note here is that they're not the same thing. They only overlap in functionality for the simplest use cases:

    • Validation gets called before the modification goes through, and you can reject the change. Tracing gets called after the modification goes through.
    • Validation can take a wide variety of parameters, including things like the index into the string at which the change happened; tracing takes no useful parameters (although you can easily fetch the new value from the variable itself).
    • Validation can hook things like focusout instead of key (meaning that instead of constantly telling the user "that's not a valid phone number" after each character until he's finally done, you can let him type whatever he wants and then check it after he tabs to the next field). Tracing hooks all changes, no matter what user event or code triggered them.
    • Validation doesn't require a StringVar, and generally can't take advantage of one usefully; tracing obviously does and can.
    • Validation is intended for validating user input; tracing is intended for debugging.
    Basically, if you already wanted a StringVar, tracing is often a good idea; even if you didn't, in simple cases, tracing is simpler. But in general, validation is the right thing.

    Other widgets

    There are other widgets that can take text entry besides Entry. What if you wanted to validate one of them?

    Some of them handle validatecommand, or textvariable (or the same thing under the name "variable" or "valid"), or both. This often isn't documented. You can always try adding the extra keyword arguments—if the widget doesn't handle validatecommand, it'll tell you that with a pretty simple exception. That works great for, say, the ttk.Entry widget, which (as you'd hope) works as a drop-in replacement for the stock Entry. 

    In some other cases, you can access the real Tk widgets under the Python or Tcl wrappers—e.g., if you've installed itk and the Tkinter wrappers around it, itk.EntryField effectively has a ttk.Entry underneath it.

    And then there's Text.

    Text

    Text (and things like ScrolledText that wrap or emulate it) doesn't anything like Entry. There's no validatecommand, textvariable, variable, value… And if you look through the Tk docs for Text, there's nothing that looks even remotely useful from the Tk side.

    Well, there's a reason that Text doesn't do validation—it's meant to be a (possibly rich) text editor, not a simple multi-line Entry field. But unfortunately, as long as Tkinter doesn't come with a simple multi-line Entry field, people are going to use Text. In fact, even in Tcl/Tk, people often use Text as a multi-line Entry field, which is why people have come up with different ways to extend it with validation and/or textvariable, as shown on the Tk wiki. There just is no way to validate Text widgets cleanly, but sometimes there's nothing else to use by Text widgets.

    If you were using Tcl, it's not at all hard to extend or wrap the Tk text command (see the wiki link above). But if you're using Python, you have to do that in Tcl/Tk, and then wrap the resulting command in Python/Tkinter. Almost nobody who uses Python wants to learn Tcl and the internal guts of Tkinter. If that's you, this is simply not an option.

    If you weren't using Tk, all of the other major cross-platform widget frameworks with Python bindings (Qt, wxWidgets, Gtk+, …) and platform-specific GUI bindings (PyWin32, PyObjC, …) have ways to write multi-line text controls with some way to validate them. Sure, they all have a higher learning curve (and none of them come pre-installed with Python), but if you're banging your head against the wall trying to make Tkinter do things that it can't do, you're probably wasting more effort than you're saving.

    If you insist on staying with pure Python/Tkinter, you just have to accept its limitations. And that may be fine. Go back to the hack mentioned in the introduction to this post—if you bind all the relevant events, you can get a handler called before the change goes through, and use after_idle in that handler to get a second handler called after the change goes through, and… is that good enough for your GUI? If so, do it.
    7

    View comments

  3. In Python 2.x Tkinter code, you see a lot of stuff like this:
        class MyFrame(Frame):
            def __init__(self, parent, n):
                Frame.__init__(self, parent)
                self.n = n
    
    Why?

    Inheritance and overriding

    Some people start on Tkinter before getting far enough into learning Python. You should definitely read the Classes chapter in the Python tutorial, but I'll summarize the basics.

    First, you're subclassing (inheriting from) the Tkinter class Frame. This means instances of your class are also instances of Frame, and can be used like Frames, and can use the internal behavior of Frame.

    When someone constructs a MyFrame object by typing MyFrame(root, 3), that creates a new MyFrame instance and calls new_instance.__init__(root, 3).

    You've overridden the parent's __init__ method with your own, so that's what gets called.

    But if you want to act like a Frame, you need to make sure all the stuff that gets done in constructing a Frame object also gets done in constructing your object. In particular, whatever Tkinter does under the covers to create a new widget in the window manager, connect it up to its parent, etc. all has to get done. So, you need Frame.__init__ to get called for your object to work.

    Unlike some other languages, Python doesn't automatically call base class constructors for you; you have to do it explicitly. The advantage of this is that you get to choose which arguments to pass to the parent--which is handy in cases (like this example) where you want to take extra arguments.

    Not so super

    The normal way to do this is to use the super function, like this:
        class MyFrame(Frame):
            def __init__(self, parent, n):
                super(MyFrame, self).__init__(parent)
                self.n = n
    
    Unfortunately, that doesn't work with Tkinter, because Tkinter uses classic classes instead of new-style classes. So, you have to do this clunky thing instead where you call the method on the class instead of calling it like a normal method.

    You don't really want to learn all about classic classes, because they're an obsolete technology. (In Python 3, they no longer exist.) You just need to know two things:
    • Never use classic classes when you can help it. If you don't have anything to put as your base class, use object.
    • When you can't help it and are forced to use classic classes (as with Tkinter), you can't use super, so you have to call methods directly on the class.
    But how does the clunky thing work?

    Unbound methods

    If you want all the gory details, see How Methods Work. But I'll give a short version here.

    Normally, in Python—like most other object-oriented languages—you call methods on objects. The way this works is a bit surprising: foo.bar(spam) actually constructs a "bound method" object foo.bar, then calls it like a function, with foo and spam as the arguments. That foo then becomes the self parameter that you have to put in every method definition.

    Since classes themselves are just another kind of objects, you can call methods on them too, like FooType.bar(spam). But here, Python doesn't have any bound object to get passed as your self parameter—it constructs an "unbound method" FooType.bar, then calls it with just spam as an argument, so there's nothing to match up with your self parameter. (Python could have been designed to pass the FooType class itself as the self parameter, but that would be confusing more often than it would be useful. When you want that behavior—for example, to create "alternate constructors" like datetime.datetime.now, you have to ask for it explicitly, with the @classmethod decorator.) So, you have to pass it yourself.

    In other words, in this code, the two method calls at the end are identical:
        class FooType(object):
            def bar(self, spam):
                print self, spam
        foo = FooType()
        foo.bar(2)
        FooType.bar(foo, 2)
    
    So, why would you ever use the clumsy and verbose second form? Basically, only when you have to. Maybe you want to pass the unbound method around to call on an object that will be created later. Maybe you had to look up the method dynamically. Or maybe you've got a classic class, and you're trying to call a method on your base class.
    1

    View comments

  4. Does Python pass by value, or pass by reference?

    Neither.

    If you twist around how you interpret the terms, you can call it either. Java calls the same evaluation strategy "pass by value", while Ruby calls it "pass by reference". But the Python documentation carefully avoids using either term, which is smart.

    (I've been told that Jeff Knupp has a similar blog post called Is Python call-by-value or call-by-reference? Neither. His seems like it might be more accessible than mine, and it covers most of the same information, so maybe go read that if you get confused here, or maybe even read his first and then just skim mine.)

    Variables and values

    The distinction between "pass by value" and "pass by reference" is all about variables. Which is why it doesn't apply to Python.

    In, say, C++, a variable is a typed memory slot. Values live in these slots. If you want to put a value in two different variables, you have to make a copy of the value. Meanwhile, one of the types that variables can have is "reference to a variable of type Foo". Pass by value means copying the value from the argument variable into the parameter variable. Pass by reference means creating a new reference-to-variable value that refers to the argument variable, and putting that in the parameter variable.

    In Python, values live somewhere on their own, and have their own types, and variables are just names for those values. There is no copying, and there are no references to variables. If you want to give a value two names, that's no problem. And when you call a function, that's all that happens—the parameter becomes another name for the same value. It's not the same as pass by value, because you're not copying values. It's not the same as pass by reference, because you're not making references to variables. There are incomplete analogies with both, but really, it's a different thing from either.

    A simple example

    Consider this Python code:

        def spam(eggs):
            eggs.append(1)
            eggs = [2, 3]
    
        ham = [0]
        spam(ham)
        print(ham)
    

    If Python passed by value, eggs would have a copy of the value [0]. No matter what it did to that copy, the original value, in ham, would remain untouched. So, it would print out [0].

    If Python passed by reference, eggs would be a reference to the variable ham. It could mutate ham's value, and even put a different value in it. So, it would print out [2, 3].

    But in fact, eggs becomes a new name for the same value [0] that ham is a name for. When it mutates that value, ham sees the change. When it rebinds eggs to a different value, ham is still naming the original value. So, it prints out [0, 1].

    Ways to confuse yourself

    People learn early in their schooling that pass by value and pass by reference are the two options, and when they learn Python, they try to figure out which one of the two it does. If you think about assignment, but not about mutability, Python looks like pass by value. If you think about mutability, but not assignment, Python looks like pass by reference. But it's neither.

    Why does Java call this "pass by value"?

    In Java, there are actually two types of values—native values, like integers, and class values. With native values, when you write "int i = 1", you're creating an int-typed memory slot and copying the number 1 into it. But with class values, when you type Foo foo = new Foo()", you're creating a new Foo somewhere, and also creating a reference-to-Foo-typed memory slot and copying a reference to that new Foo into it. When you call a function that takes an int argument, Java copies the int value from your variable to the parameter. And when you call a function that takes a Foo argument, Java copies the reference-to-Foo value from your variable to the parameter. So, in that sense, Java is calling by value. (This is basically the same thing that most Python implementations actually do deep under the covers, but you don't normally have to, or want to, think about it. Java puts that right up on the surface.)

    Why does Ruby call this "pass by reference"?

    Basically, Ruby is emphasizing the fact that you can modify things by passing them as arguments.

    This makes more sense for Ruby than for Python, because Ruby is chock full of mutating methods, and there's an in-place way to do almost everything, while Python only has in-place methods for the handful of things where it really matters. For example, in Python, there is no in-place equivalent of filter, but in Ruby, select! is the in-place equivalent of select.

    But really, it's still sloppy terminology in Ruby. If you assign to a parameter inside a function (or block), it does not affect the argument any more than in Python.

    So what should we call it?

    People have tried to come up with good names. Before Python ever existed, Barbara Liskov realized that there was an entirely different evaluations strategy from pass by value and pass by reference, and called it "pass by object". Others have called it "pass by sharing". Or, just to confuse novices even farther, "pass by value reference". But none of these names ever gained traction; nobody is going to know what these names mean if you use them.

    So, just don't call it anything. Anyone who tries to answer a question by saying, "Well, Python passes by reference [or by value, or by some obscure term no one has ever heard of], so…" is just confusing things, not explaining.

    When Python's parameter evaluation strategy actually matters, you have to describe how it works and how it matters. So just do that.

    When it doesn't matter, don't bring it up.

    Ways to un-confuse yourself

    Instead of trying to get your head around how argument passing works in Python, get your head around how variables work. That's where the big difference from languages like Java and C++ lies, and once you understand that difference, argument passing becomes simple and obvious.

    How do I do real pass by reference, then?

    You don't. If you want mutable objects, use mutable objects. If you want a function to re-bind a variable given to it as an argument, you're almost always making the same mistake as when you try to dynamically create variables by name.

    And that should be a clue to one way you can get around it when "almost always" isn't "always": Just pass the name, as a string. (And, if it's not implicitly obvious what namespace to look the name up in, pass that information as well.)

    But usually, it's both simpler and clearer to wrap your state up in some kind of mutable object—a list, a dict, an instance of some class, even a generic namespace object—and pass that around and mutate it.

    What about closures?

    Yes, using closures can also be a way around needing to pass by reference. This shouldn't be too surprising—closures and objects are dual concepts in a lot of ways.

    I meant how do closures work, if there's no pass by reference?

    Oh, you just had to ask, didn't you. :)

    The simple, but inaccurate, way to think about it is that a closure stores a name as a string and a scope to look it up in. Just like I just told you not to do. But it's happening under the covers. In particular, your Python code never uses the variable name as a string. The fact that the interpreter uses the variable name as a string doesn't matter; it always uses names as strings. If you look at the attributes of a function object and its code object, you'll see the names of any globals you reference (including modules, top-level functions, etc.) any locals you create, the function parameters themselves, etc.

    But if you look carefully (or think about it hard enough), you'll realize that this isn't actually sufficient for closures. Closures are implemented by storing special cell objects in the function object, which are basically references to variables in some nonlocal frame, and there are special bytecodes to load and store those variables. (The code that owns the frame uses those bytecodes to access the variables it exposes to the closure; the code inside the closure uses them to access the variables it has cells for.) You can even see one of these cell objects in Python, and read the value of the referenced variable (func.__closure__[0].cell_contents, it you're dying of curiosity), although you can't write the value this way.

    And if you're now thinking, "Aha, with some bytecode hacking, I could fake call by reference". Well, yeah, but you think it'll be simpler than passing around a 1-element list so you can fake call by reference?
    9

    View comments

  5. In database apps, you often want to create tables, views, and indices only if they don't already exist, so they do the setup work the first time, but don't blow away all of your data every subsequent time. So SQL has a special "IF NOT EXISTS" clause you can add to the various CREATE statements.

    Occasionally, you want to do the same thing in Python. For example, this StackOverflow user likes to re-run the same script over and over in the interactive session (e.g., by hitting F5 in the IDE). That's kind of an odd thing to do in general, but it's not hard to imagine cases where it makes sense. For example, you might be expanding or debugging part of the script, and want to use the rest of the script while you do so.

    Normally, that wouldn't be a problem, but what if the script and modified created some global variables, or class or function attributes, etc., and you didn't want those to be overwritten?
    That might sound like an anti-pattern, but imagine that you have a function that you've memoized with functools.lru_cache, and it's cached hundreds of expensive values. If you replace it with a new copy of the function, it'll have an empty cache.

    Of course the right thing to do is to factor out the script into separate modules, and have the script import the stable code instead of just including it directly. But you don't always want to take a break from actively hacking on code to refactor it.

    The easy (but ugly) way

    You can always do this:
        @lru_cache()
        def _square(x):
            return x*x
        try:
            square
        except NameError:
            square = _square
    
    And if you only have to do it to one function, maybe that's the best answer. But if you have to do it to a dozen functions, that'll get ugly, and all that repetition is an invitation to copy and paste and screw it up somewhere. So, what you want to do is factor it out into a function.

    But how? What you want is something like this:
        create_if_not_exists(square, _square)
    
    In a language like C++, you'd do that by taking a reference to a function variable as the first parameter, but you can't have a reference to the square variable into the function, because that doesn't make any sense in Python; variables aren't things you can take references of.

    You might be able to use some horrible frame hacks to pass the value in and have the function figure out the name from the calling frame, but this is already hacky enough. You might be able to do it with MacroPy, but there are probably cooler ways you can solve the original problem once you're using macros.

    Strings as names

    The key thing to notice is that ultimately, a variable name is just a string that gets looked up in the appropriate scope. Any frame hack, macro, etc. would just be getting the name as a string and setting its value by name anyway, so why not make that explicit?

    This is one of those examples that shows that, while usually you don't want to dynamically create variables, occasionally you do.

    So, how do you do it?

    There are three options.

    • Use exec to declare the variable global or nonlocal and then reassign it.
    • Call setattr on the enclosing scope object.
    • Use the globals dict.

    First, using exec for reflection is almost always the wrong answer, so let's just rule that out off the bat.

    The setattr solution is more flexible, but in this case I think that's actually a negative. The whole point of what we're trying to do is to modify the global scope by (re-)executing a script. If it doesn't work when you instead execfile the script in the middle of a function… good!

    The way to create a global variable dynamically is:
        def create(name, value):
            globals()[name] = value
    

    The "if not exists" part


    Of course create('square', _square) does the exact same thing as just square = _square. We wanted to only bind square if it doesn't exist, not rebind it no matter what.

    Once you think of it as dict value assignment, the answer is obvious:
        def create_if_not_exists(name, value):
            globals().setdefault(name, value)
    
    And that's the whole trick.

    Decorators

    Except it's not the whole trick; there's one more thing we can do: Turn it into a decorator.
        def create_if_not_exists(name):
            def wrap(func):
                globals().setdefault(name, func)
                return func
            return wrap
    

    Getting the name for free

    Remember when I said that you can't get the name of a variable? Well, a decorator is only going to be called on functions or classes, and almost only ever going to be called on a function or class created with the def or class statement (or the result of decorating such a thing), which means it will have a name built in, as its __name__ attribute.

    The problem is, this is the name of the actual implementation function, _square, not the name we want to bind it to, square. But those two names should give you an idea: If you just make that underscore prefix a naming convention (and it already fits in with existing Python naming conventions pretty well), you _can_ get the name to bind to. So:
        def create_if_not_exists(func):
            assert func.__name__[0] == '_'
            globals().setdefault(func.__name__[1:], func)
            return func
    
    And now, all we need to do to prevent the new (decorated) function from overwriting the old one is to attach this decorator:
        @create_if_not_exists
        @lru_cache()
        def _square(x):
            return x*x
    
    And now, we really are done. How much simpler can you get than that?

    When you want to rebind square

    Maybe square was supposed to be part of the safe, static code that you don't want to blow away with every re-run, but then you found a bug in it. How do you load the new version?

    Simple: either del square before hitting F5, or square = _square after hitting F5.

    A brief discursion on recursion

    Renaming functions after they're created doesn't play well with recursive functions. In Python, recursive functions call themselves by global lookup on their own name. So, if you write this:
        @create_if_not_exists
        @lru_cache()
        def _fact(n):
            if n < 2: return 1
            return n * _fact(-1)
    
    … your original _fact function is recursively calling whatever happens to be named _fact at run time. Which means that, after a re-run, it's going to be calling the new _fact function, with its new and separate cache, which makes the whole cache thing worthless.

    The answer is simple: Call yourself by your public name, not your private name.
        @create_if_not_exists
        @lru_cache()
        def _fact(n):
            if n < 2: return 1
            return n * fact(-1)
    
    Now, your original _fact function, which you've also bound to fact (and, for that matter, the new _fact that you haven't bound to fact) will call fact, which is still the original function. Tada.
    0

    Add a comment

  6. Sometimes you want to write a round-trippable __repr__ method--that is, you want the string representation to be valid Python code that generates an equivalent object to the one you started with.

    First, ask yourself whether you really want this. A round-trippable repr is very nice for playing with your objects in an interactive session--but if you're trying to use repr as a serialization format, don't do that.

    Now that you're sure you want to do it, here are a few simple rules that novices often get wrong.

    What if I can't write a round-trippable repr?

    Sometimes there's some format that's useful for debugging purposes, even though it's not round-trippable. The repr of a BeautifulSoup node is the raw HTML or XML for the node's subtree. The repr of a list is only round-trippable if it doesn't contain itself. The repr of a NumPy array is round-trippable if it's small, but elided if it's too big. And so on. This can be perfectly reasonable.

    In these cases, generally, you should avoid making anything that looks round-trippable, and also avoid anything that looks like the default repr with the angle brackets and address or id. But note that lists and NumPy arrays both violate that guidelines.

    If you can't think of anything useful, definitely don't try to fake the default repr with the angle brackets and address or id. Just leave the default. If you have some base class that overrides __repr__ and you want to undo that, just call object.__repr__(self).

    A repr should look like a constructor call

    The basic rule is that, for any type that doesn't have a built-in literal representation (and those types already have built-in repr methods), the repr should look like a constructor call. Look at what repr does with various stdlib classes:
        datetime.datetime(2013, 11, 7, 12, 1, 49, 915797)
        bytearray(b'abcdef')
        deque([1, 2, 3], maxlen=4)
    
    That's what you want to do with your classes. So, if you've written this:
        class Breakfast:
            def __init__(self, spam, eggs=0, beans=0):
                self.spam, self.eggs, self.beans = spam, eggs, beans
            def more_spam(self, spam):
                self.spam += spam
            def __repr__(self):
                ???
    
        breakfast = Breakfast(3, beans=10)
        breakfast.more_spam(4)
        print(repr(breakfast))
    
    ... you should get something like this:
        meals.Breakfast(spam=7, eggs=0, beans=10)
    

    What arguments?

    Obviously, the arguments are whichever arguments will generate an equivalent object.

    What's an equivalent object? Well, if you've designed your objects to be comparable, usually the answer is that whatever __eq__ considers equal is what you want. If you haven't, it may be a bit harder to muddle out the right rule, but it's worth doing--and if you can't think of one, you probably shouldn't be generating a round-trippable repr for your type.

    In the case of the Breakfast type, it's obvious: there are three attributes, and the constructor sets them all to the three arguments it gets, so just pass them.

    To keyword, or not to keyword?

    These are both perfectly valid, equivalent representations:
        Breakfast(7, 0, 10)
        Breakfast(spam=7, eggs=0, beans=10)
    
    Which one is better? Well, the first one is more concise, but the second one is more explicit. So they're each better than the other.

    The important thing to keep in mind is that you're doing this for readability and usability. If the parameters are obvious enough that you usually pass the arguments without keywords, generate the arguments without keywords. Otherwise, with. That's really the only factor that goes into the decision.

    Which class?

    It's tempting to hard-code the class name into the string. But what if someone builds a subclass of your class, and doesn't add any new constructor parameters, maybe doesn't even override __init__? Why should they have to override __repr__ as well? If that's at all plausible for your type, use the type name dynamically.

    (Also, if it seems reasonable that someone might rename your class for some reason, e.g., while wrapping it up, use the type name dynamically.)

    Finally, if you want the repr to be round-trippable, you want to use the qualified name of the class. But how qualified? Do you want just the __qualname__, or do you want to add the module name as well? Note that datetime and deque in the stdlib examples above made different choices. Ultimately, the best way to decide is whether you expect your users to be typing "datetime.datetime", or just "deque"--that is, is your module intended to be used with "from module import class1, class2", or with "import module"?

    Note that the repr of a class with a standard metaclass is the same as this:
        '{!r}.{!r}'.format(self.__module, self.__qualname__)
    
    And if you want the fully-qualfiied name, that's exactly what you'd do yourself. So you might as well just use the repr.

    Always delegate to repr, never str

    A common mistake is to write one of the following:
        def __repr__(self):
            return 'Breakfast({}, {}, {})'.format(self.spam, self.eggs, self.beans)
    
        def __repr__(self):
            return 'Breakfast(%s, %s, %s)' % (self.spam, self.eggs, self.beans)
    
    That's perfectly fine when your members are small integers, but what if they're, say, strings? The str of a string is not a valid string literal. It's even worse if the string has invisible characters, or Unicode characters that your console can't represent. Even worse if it has a comma in the middle of it, or quotes. And if you've got both bytes and unicode strings, calling str on a bytes is a bad idea--and, in 2.x, calling it on a unicode object is an even worse one.

    Don't try to fix this by manually quoting things; if you ever find yourself typing "%s" or "{}", you're probably doing something wrong. If you get to the point where you're trying to figure out how to escape quotes in the middle of it, you're definitely doing something wrong.

    And of course strings aren't the only problem. For example, a datetime's str is an ISO-format string, with or without microseconds, with no timezone information. It's not just something that isn't a valid expression on its own, it's something that's painful to parse even when you know you need to parse it (although 3.4 or 3.5 should make that better).

    The solution is to use the repr of each member, not the str:
        def __repr__(self):
            return 'Breakfast({!r}, {!r}, {!r})'.format(self.spam, self.eggs, self.beans)
    
        def __repr__(self):
            return 'Breakfast(%r, %r, %r)' % (self.spam, self.eggs, self.beans)
    
    Note that the other way around isn't as universally true: sometimes it makes sense for str to delegate to repr (as all of the built-in collections do).

    Putting it all together

    In this silly example, we don't have enough information to decide whether we want the argument names, or whether we want the module name. Even in real life, that can happen. It's generally better to err on the side of being too explicit than too implicit, so let's do that here:
        def __repr__(self):
            return '{!r}(spam={!r}, eggs={!r}, beans={!r})'.format(
                self.__class__, self.spam, self.eggs, self.beans)
    
    0

    Add a comment

  7. Many novices notice that, for many types, repr and eval are perfect opposites, and assume that this is a great way to serialize their data:
        def save(path, *things):
            with open(path, 'w') as f:
                for thing in things:
                    f.write(repr(thing) + '\n')
    
        def load(path):
            with open(path) as f:
                return [eval(line) for line in f]
    

    If you get lucky, you start running into problems, because some objects don't have a round-trippable repr. If you don't get lucky, you run into the _real_ problems later on.

    Notice that the same basic problems come up designing network protocols as file formats, with most of the same solutions.

    The obvious problems

    By default, a custom class--like many third-party and even stdlib types--will just look like <spam.Spam at 0x12345678>, which you can't eval. And these are the most fun kinds of bugs--the save succeeds, with no indication that anything went wrong, until you try to load the data later and find that all your useful information is missing.

    You can add a __repr__ method to your own types (which can be tricky; I'll get to that later), and maybe even subclass or patch third-party types, but eventually you run into something that just doesn't have an obvious round-trippable representation. For example, what string could you eval to re-generate an ElementTree node?

    Besides that, there are types that are often round-trippable, but aren't when they're too big (like NumPy arrays), or in some other cases you aren't likely to run into until you're deep into development (e.g., lists that contain themselves).

    The real problems

    Safety and security

    Let's say you've written a scheduler program. I look at the config file, and there's a line like this (with the quotes):
        "my task"
    
    What do you think will happen if I change it to this?
        __import__("os").system("rm -rf /")
    
    The eval docs explicitly point this out: "See ast.literal_eval() for a function that can safely evaluate strings with expressions containing only literals." Since the set of objects that can be safely encoded with repr and eval is pretty not much wider than the set of objects that can be encoded as literals, this can be a solution to the problem. But it doesn't solve most of the other problems.

    Robustness

    Using repr/eval often leads to bugs that only appear in certain cases you may not have thought to test for, are very hard to track down when they do.

    For example, if you accidentally write an f without quotes when you meant to write "f", that _might_ get an error at eval time... or, if you happen to have a variable named f lying around at eval time, it'll get whatever value is in that variable. (Given the name, chances are it's a file that you expected to be short-lived, which now ends up being kept open for hours, causing some bug half-way across your program...)

    And the fact that repr looks human-readable (as long as the human is the developer) makes such un-caught mistakes even more likely once you start editing the files by hand.

    Using literal_eval solves this problem, but not in the best way. You will usually get an error at read time, instead of successfully reading in garbage. But it would be a lot nicer to get an error at write time, and there's no way to do that with repr.

    Portability

    It would be nice if your data were usable by other programs. Python representations look like they should be pretty portable--they look like JavaScript, and Ruby, and most other "scripting" langauges (and in some cases, even like valid initializers for C, C++, etc.).

    But each of those languages is a little bit different. They don't backslash-escape the same characters in strings, and have different rules about what unnecessary/unknown backslashes mean, or what characters are allowed without escaping. They have different rules for quoting things with quotes in them. Almost all scripting languages agree on the basic two collection types (list/array and dict/hash/object), but most of them have at least one other native collection type that the others don't. For example, {1, 2, 3} is perfectly legal JavaScript, but it doesn't mean a set of three numbers.

    Unicode

    Python string literals can have non-ASCII characters in them... but only if you know what encoding they're in. Source files have a coding declaration to specify that. But data files don't (unless you decide you want to incorporate PEP 263 into your data file spec, and write the code to parse the coding declarations, and so on).

    Fortunately, repr will unicode-escape any strings you give it. (Some badly-designed third-party types may not do that properly, so you'll have to fix them.)

    But this means repr is not at all readable for non-English strings. A Chinese name turns into a string of \u1234 sequences.

    What to use instead

    The key is that you want to use a format designed for data storage or interchange, not one that just happens to often work.

    JSON

    JavaScript Object Notation is a subset of JavaScript literal syntax, which also happens to be a subset of Python literal syntax. But JSON has advantages over literal_eval.

    It's a de facto standard language for data interchange. Python comes with a json module; every other language you're likely to use has something similar either built in or easily available; there are even command-line tools to use JSON from the shell. Good text editors understand it.

    There's a good _reason_ it's a de facto standard: It's a good balance between easy for machines to generate, and parse, and validate, and easy for humans to read and edit. It's so simple that a JSON generator can add easy-to-twiddle knobs to let you trade off between compactness and pretty-printing, etc. (and yes, Python's stdlib module has them).

    Finally, it's based on UTF-8 instead of ASCII (or Latin-1 or "whatever"), so it doesn't have to escape any characters except a few special invisible ones; a Chinese name will look like a string of Chinese characters (in a UTF-8-compatible editor or viewer, at least).

    pickle

    The pickle module provides a Python-specific serialization framework that's incredibly powerful and flexible.

    Most types can be serialized without you even having to think about it. If you need to customize the way a class is pickled, you can, but you usually don't have to. It's robust. Anything that can't be pickled will give you an error at save time, rather than at load time. It's generally fast and compact.

    Pickle even understands references. For example, if you have a list with 20 references to the same list, and you dump it out and restore it with repr/eval, or JSON, you're going to get back a list of 20 separate copies that are equal, but not identical; with Pickle, you get what you started with. This also means that pickle will only use 1/20th as much storage as repr or JSON for that list. And it means pickle can dump things with circular dependencies.

    But of course it's not a silver bullet.

    Pickle is not meant to be readable, and in fact the default format isn't even text-based.

    Pickle does avoid the kind of accidental problems that make eval unsafe, but it's no more secure against malicious data.

    Pickle is not only Python-specific, but your-program-specific. In order to load an instance of a custom class, pickle has to be able to find the same class in the same module that it used at save time.

    Between JSON and pickle

    Sometimes you need to store data types that JSON can't handle, but you don't want all the flexibility (and insecurity and non-human-readability) of pickle.

    The json module itself can be extended to tell it how to encode your types; simple types can often be serialized as just the class name and the __dict__; complex types can mirror the pickling API. jsonpickle can do a lot of this work for you.

    YAML is a superset of JSON that adds a number of basic types (like datetime), and an extensibility model, plus an optional human-friendly (indentation-based) alternative syntax. You can restrict yourself to the safe subset of YAML (no extension types), or you can use it to build something nearly as powerful as pickle (simple types encoded as basically the class name plus the __dict__, complex types using a custom encoding).

    There are also countless other serialization formats out there, some of which fall within this range. Many of them are better when you have a more fixed, static shape for your objects. Some of them, you'll be forced to use because you're talking to some other program that only talks ASN.1 or plist or whatever. Wikipedia has a comparison table, with links.

    Beyond JSON and pickle

    Generally, you don't really need to serialize your objects, you need to serialize your data in such a way that you can create objects as-needed.

    For example, let's say you have a set of Polynomial objects, where each Polynomial has a NumPy array of coefficients. While you could pickle that set, there's a much simpler idea: Just store a list instead of a set, and each element as a list of up to N coefficients instead of an object. Now you've got something you can easily store as JSON, or even just CSV.

    Sometimes, you can reduce everything to a single list or dict of strings, or of fixed records (each a list of N strings), or to things that you already know how to serialize to one of those structures. A list of strings without newlines can be serialized as just a plain text file, with each string on its own line. If there are newlines, you can escape them, or you can use something like netstrings. Fixed records are perfect for CSV. If you have a dict instead of a list, use dbm. For many applications, simple tools like that are all you need.

    Often, using one of those trivial formats at the top level, and JSON or pickle for relatively small but flexible objects underneath, is a good solution. The shelve module can automatically pickle simple objects into a dbm for you, but you can build similar things yourself relatively easily.

    When your data turn out to have more complex relationships, you may find out that you're better off thinking through your data model, separate from "just a bunch of objects". If you have tons of simple entities with complex relations, and need to search in terms of any of those relations, a relational database like sqlite3 is perfect. On the other hand, if your data are best represented as non-trivial documents with simple relations, and you need to search based on document structure, Couchdb might make more sense. Sometimes an object model like SQLAlchemy can help you connect your live object model to your data model. And so on.

    But whatever you have, once you've extracted the data model, it'll be a lot easier to decide how to serialize it.
    1

    View comments

  8. There are a number of blogs out there that tackle the problems of callbacks for servers, or for Javascript, but novices trying to write Python GUIs shouldn't have to learn about the different issues involved in servers, or a whole different language.

    In another post, I showed the two major approaches to writing asynchronous GUI code: threading and callbacks. Both have drawbacks. But there are a number of techniques to get many of the advantages of threads, on top of callbacks. So, to some extent, you can get (part of) the best of both worlds.

    Most of these techniques come from the world of network servers. The central issue facing network servers is the same as GUIs--your application has to be written as a bunch of event handlers that can't block or take a long time. But the practical details can be pretty different.

    Update #1: The original version of this made it sound as if async-style coroutines for GUIs would be the ideal solution to the problem. They wouldn't, they'd just be the best solution we could have in Python as the language is today. So I added a new section at the end.

    Update #2: Callbacks as our Generations' Go To Statement by Miguel de Icaza is the best description I've seen so far of what's wrong with callback hell, and how to fix it. He's coming from a .NET (C#/F#) perspective, but (exaggerating only slightly) C# await/async is exactly how you'd design Python's coroutines and async module if you didn't already have generators, so it's worth reading.

    Where we left off

    For reference, here's a simple example in a synchronous version (which blocks the GUI unacceptably), a callback version (which is ugly and easy to get wrong), and a threaded version (which fails because it tries to access widgets from outside the main thread):

        def handle_click_sync():
            total = sum(range(1000000000))
            label.config(text=total)
    
        def handle_click_callback():
            total = 0
            i = 0
            def callback():
                nonlocal i, total
                total += sum(range(i*1000000, (i+1)*1000000))
                i += 1
                if i == 100:
                    label.config(text=total)
                else:
                    root.after_idle(callback)
            root.after_idle(callback)
    
        def handle_click_threads():
            def callback():
                total = sum(100000000)
                label.config(text=total)
            t = threading.Thread(target=callback)
            t.start()
    

    Ideally, we want to get something that looks as nice as the threading example, but that actually works.

    Promises

    The simplest solution to avoid callback hell without using threads is to use promises, aka deferreds, objects that wrap up callback chains to make them easier to use.

    While this idea came out of Python, where it's really taken off is Javascript. Partly this is because web browsers provide a callback-based API for both the GUI and I/O, and the language doesn't have any of the nice features Python has that make some of the later options possible.

    The Twisted documentation explains the idea better than I could. If you can follow Javascript, there are also 69105 good tutorials you can find with a quick Google.

    So, I'll just show an example (with a fictitious API, since none of the major GUI frameworks are built around promises):

        def handle_click():            
            d = Deferred()
            for i in range(100):
                def callback(subtotal):
                    d = Deferred()
                    d.resolve(subtotal + sum(range(i*1000000, (i+1)*1000000)))
                    return d
                d.addCallback(lambda: root.after_idle(callback))
            d.addCallback(lambda: label.config(text=total))
            return d
    

    While it's not quite as nice as the threaded version, it's similar--we don't have to turn the control flow inside-out, we don't have to worry about getting lost in the chain, exceptions will be propagated automatically and with useful tracebacks, and so on.

    Also, notice how easy it would be to write a wrapper that maps a callback over an iterable, chaining the calls in series. Or various other ways of combining things. Again, if you can read Javascript, you can find some good examples of these kinds of functions online by searching, e.g, "deferred map".

    You can also probably tell that Deferreds aren't that heavy of a wrapper around callbacks, so it's always pretty easy to understand what's happening under the covers. In practice, that isn't useful as often as it sounds--but it's not completely useless, either. But, more importantly, this means it's very easy to wrap a callback-based API in a promise-based API. Again, if you can read Javascript, you can find some "promisify" implementations and see how they work.

    Yielding control

    Some GUI frameworks have a way to yield control to the event loop, then resume in-place. A few platforms have native support for this kind of thing; on other platforms, they generally fake it on top of one of the other techniques below, but you can assume they did so in a way that makes sense and just as it.

    Tkinter doesn't have such a mechanism, but I'll pretend that it had a function like wx's Yield. Then we could do this:

        def handle_click():
            total = 0
            for i in range(100):
                total += sum(range(i*1000000, (i+1)*1000000))
                root.yield_control(only_if_needed=True)
            label.config(text=total)
    

    We did have to break the computation up into steps so we could yield every so often, but other than that, we didn't have to change the code at all. That's almost as simple as the threaded code, and without any of the drawbacks. However, yielding like this has some problems of its own.

    Imagine how you'd write the sleep example from the previous post with this feature. There's no way to yield control for one second, at least not without an additional yield_sleep method. And likewise, there's no yield_read or yield_urlopen. So, this only really works for the simplest cases. While it wouldn't be impossible for some framework to include functions for all of these other cases, it would be infeasible.

    Nested event loops

    If you manually nest one event loop inside another, you can effectively break up your sequential code just by running a single step of the event loop, or a short-lived loop, every so often. Effectively, this gives you a way to write a yield_control function even if one doesn't exist.

    However, this has the same problems as yield_control, plus some additional problems. If the user clicks the button (or triggers some other handler) while we're in the middle of processing the first press, we'll end up calling one event handler in the middle of another. This means your code has to all be reentrant.

    On top of that, an impatient user, or just a flood of mouse-over events, could push you over the recursion limit.

    At any rate, while this is doable with many GUI frameworks, including (with a bit of work) Tkinter, it's usually not the best solution.

    Greenlets

    Greenlets, aka fibers, are cooperative threads. They have an API similar to real threading, except that you have to explicitly tell them when to give control to another thread.

    The code looks like a cross between threading code and yielding code. Using a fictitious greenlet API that lets the Tkinter root object also work as a greenlet controller:

        def handle_click():
            def callback():
                total = 0
                for i in range(100):
                    root.switch()
                    total += sum(range(i*1000000, (i+1)*1000000))
                label.config(text=total)
            t = root.new_greenlet(target=callback)
            root.switch(t)
    

    One advantage of greenlets is that they allow you to compose low-level async functions into higher-level async functions by just spawning a greenlet that calls the low-level functions. So, once you've got a library like gevent for the low-level async I/O, it's almost trivial to wrap up even something as complicated as requests, as you can see in the source to the grequests wrapper. And then we can use it like this:

        def handle_click():
            def callback():
                r = grequests.get('http://example.com')
                soup = BeautifulSoup(r.text)
                label.config(text=soup.find('p').text)
            t = root.new_greenlet(target=callback)
            root.switch(t)
    

    (In fact, gevent itself wraps up large chunks of the stdlib this way, including urlopen, so we didn't even really need this example, but it seemed worth showing anyway.)

    The big problem is that greenlets have to be integrated with your event loop framework. And, while there are greenlet-based networking libraries like gevent, there is no greenlet-based GUI library.

    If your GUI framework has a way to manually run one iteration of the event loop at a time, you can run the event loop itself inside a greenlet, and then drive it with any greenlet-based event loop, like gevent, or (if you don't need any async I/O) just a trivial scheduler loop. This can be pretty handy with a game framework like pygame. It's rarely used with GUI frameworks like Tkinter, but there's no reason it couldn't be.

    Coroutines

    You can build coroutines out of Python generators, which allow you to get most of the benefits of greenlets without actually needing greenlets. Instead of switching to another greenlet, you yield from another coroutine (or a future object that wraps up the result of a coroutine). Greg Ewing has a great presentation on this.

    The code for a hypothetical coroutine-based GUI framework might look like this:

        def handle_click():
            total = 0
            for i in range(100):
                yield from sleep(0)
                total += sum(range(i*1000000, (i+1)*1000000))
            label.config(text=total)
    

    But there is no such framework.

    Also, integrating a coroutine-based GUI framework with an async I/O framework (except by using the hybrid approach of having separate threads for each) looks like a hard problem that nobody's even thought through, much less solved.

    Coroutines over callbacks

    Instead of building a coroutine scheduler, you can build a coroutine API on top of explicit callbacks. PEP 3156 will add this to the standard library in some future version, and there are already third-party frameworks like Twisted's inlineCallbacks, or Monocle.

    The code looks the same as the above, except that you generally need some kind of decorator to mark your coroutines. So:

        @coroutine
        def handle_click():
            total = 0
            for i in range(100):
                yield from sleep(0)
                total += sum(range(i*1000000, (i+1)*1000000))
            label.config(text=total)
    

    And, while integrating a GUI framework with an async I/O framework is still a problem, doing so at the callback level is a solved problem. Some post-PEP 3156 project may add it to the standard library, and Twisted already integrates with a variety of GUI frameworks, including Tkinter.

    With a fictitious API to match the previous example:

        @coroutine
        def handle_click():
            r = yield from urlopen('http://example.com')
            data = yield from r.read()
            soup = BeautifulSoup(data)
            label.config(text=soup.find('p').text)
    

    Notice that with greenlets, async operations didn't require any kind of marking; you only needed to call switch when you wanted to give up control without waiting on an async operation. With coroutines, you mark both cases with an explicit yield from. Is this extra noise in the coroutine code, or is it a case of "explicit is better than implicit"? That's a matter of opinion, or at least debate.

    Implicit coroutines, implicit futures, dataflow variables, …

    Async-module-style coroutines are pretty cool, but there's a problem. As soon as I decide to make some previously-synchronous function asynchronous (e.g., maybe I had a function that returns a quick string, but then I realized that sometimes it needs to load it off disk over a network share), every function that calls it has to now yield from it—and, of course, has to become a coroutine itself, meaning every function that called it has to adjust, and so on up the chain. It's certainly better than manually pushing callbacks up the chain and figuring out how to propagate errors and so forth, but… why can't the compiler or interpreter do this for us?

    The interpreter can tell that I'm calling an async coroutine without yield from and complain at me. Could it just automatically do a yield from and turn my function into a coroutine (the same way adding any yield or yield from turns it into a generator) instead, and push that all the way up the chain? My top-level function would need to be changed manually the first time you use an async coroutine anywhere in the code, but that's it.

    Unfortunately, that doesn't fit into the model that Python uses. But I don't see why you couldn't design a language that works that way.

    But once you're doing that, there may be an easier solution. Python's futures are, while a higher-level abstraction than callbacks, still lower-level than you'd want. First, they're explicit—you can't just use a future string as a string, you have to ask it for its value and then use the result as a string. And there's a reason for that: asking for the value is a blocking operation. Which means you usually don't ask for the value; you attach a callback, and its runloop calls your callback. The key to the coroutine design is that it effectively lets you yield from a future's value instead of attaching a callback. So, what if there were no implicit yield-from-ing, and maybe even no explicit yield-from-ing, and instead, accessing a future's value automatically turned your function into a coroutine and blocked it until the value was ready? That would get you 90% of the benefit of implicit coroutines without the problems. And then, of course, you could make your futures implicit—a future string is just a string, but the first time you try to use it, you block until it's ready. (This does mean that some cases, like waiting with a timeout, need some other mechanism, but that's pretty easy—allow you to use any value as if it were a future; print('s: {}'.format(timeout(s, 5.0))) if s is a future string waits up to 5 seconds for s and raises if not ready; if s is a plain old string, it just returns immediately. This is effectively the way Alice ML and a few other research languages approach the async problem.

    At first glance, it looks like greenlet libraries like gevent give you the same benefits, but that's misleading. They have to patch all of the low-level synchronous methods with more complicated explicit greenlet-blocking code. Often those low-level primitives can be composed without thinking about it, but, unlike futures, that's not universally true. And if you want to add some new functionality, you can't just return a future, you have to do the same work that gevent does to wrap up the new primitives.

    And you can simplify things even more than implicit futures with dataflow variables. In Python, a variable is just a binding from a name for an object. But in Oz, this is split into two parts, which allows you to create structures that contain unbound variables and then bind those partially-bound structures to variables. I won't get into all of the differences this makes, because if you haven't read CTM and you're serious about programming, you should stop whatever you're doing and go read it, but there's one important benefit here: The system needs some strategy for dealing with access to unbound variables, and one such strategy is just to block until the variable is bound. This makes everything an implicit future, and it leaves it up to the language to schedule things properly (when combined with greenlets, this gives you the same kind of magic that gevent had to do explicitly, but for free), but also makes everything available to the program when it matters.

    One last thing: Sometimes, to understand what's really happening, it helps to think in terms of continuations. A continuation is just the rest of the program after a given point. A few languages (notably Scheme and its descendants) let you explicitly ask for a continuation, pass it around, and continue it. This is primarily useful for building coroutines (which we can already do in Python without needing to think about continuations). When you take all of the code from your function after a blocking call, plus a callback that was passed in, and wrap it all up in a callback function that you then pass to that blocking call, what you're doing is recreating continuations explicitly and at a higher level. If you could just get the current continuation and pass it, you could then write the rest of your function synchronously, and then continue the passed-in continuation at the end, without needing to wrap anything up in a callback function. So far, that still sounds less user-friendly than yield from coroutines, because you have to explicitly accept and continue a continuation. But that part is exactly the part we're trying to automate away. So, imagine a language that allowed you to mark primitives with @blocking, which would give them a hidden extra parameter for a continuation, and turn every return into a continue. And then, calling a @blocking function would get the current continuation and pass it as that hidden parameter. And that's all you'd have to do. That's exactly what both implicit continuations and implicit futures are trying to accomplish, but they're attacking it at a higher level. That may be worth doing, because normally-implicit continuations and futures that you can access explicitly when you want to are useful programming abstractions in their own right. But attacking it at the lowest level may make the problem easier to solve.

    At any rate, none of the ideas in this section would fit into Python (even if most of them weren't rambling messages that may fall off the thin line between too much detail and inaccuracy for the sake of simplicity…), but they're all worth keeping in your head when thinking about what you could do for Python, and how.
    0

    Add a comment

  9. Imagine a simple Tkinter app. (Everything is pretty much the same for most other GUI frameworks, and many frameworks for games and network servers, and even things like SAX parsers, but most novices first run into this with GUI apps, and Tkinter is easy to explore because it comes with Python.)

        def handle_click():
            print('Clicked!')
        root = Tk()
        Button(root, text='Click me', command=handle_click).pack()
        root.mainloop()
    

    Now imagine that, instead of just printing a message, you want it to pop up a window, wait 5 seconds, then close the window. You might try to write this:

        def handle_click():
            win = Toplevel(root, title='Hi!')
            win.transient()
            Label(win, text='Please wait...').pack()
            for i in range(5, 0, -1):
                print(i)
                time.sleep(1)
            win.destroy()
    

    But when you click the button, the window doesn't show up. And the main window freezes up and beachballs for 5 seconds.

    This is because your event handler hasn't returned, so the main loop can't process any events. It needs to process events to display a new window, respond to messages from the OS, etc., and you're not letting it.

    There are two basic ways around this problem: callbacks, or threads. There are advantages and disadvantages of both. And then there are various ways of building thread-like functionality on top of callbacks, which let you get (part of) the best of both worlds, but I'll get to those in another post.

    Callbacks

    Your event handler has to return in a fraction of a second. But what if you still have code to run? You have to reorganize your code: Do some setup, then schedule the rest of the code to run later. And that "rest of the code" is also an event handler, so it also has to return in a fraction of a second, which means often it will have to do a bit of work and again schedule the rest to run later.

    Depending on what you're trying to do, you may want to run on a timer, or whenever the event loop is idle, or every time through the event loop no matter what. In this case, we want to run once/second. In Tkinter, you do this with the after method:

        def handle_click():
            win = Toplevel(root, title='Hi!')
            win.transient()
            Label(win, text='Please wait...').pack()
            i = 5
            def callback():
                nonlocal i, win
                print(i)
                i -= 1
                if not i:
                    win.destroy()
                else:
                    root.after(1000, callback)
            root.after(1000, callback)
    

    For a different example, imagine we just have some processing that takes a few seconds because it has so much work to do. We'll do something stupid and simple:

        def handle_click():
            total = sum(range(1000000000))
            label.config(text=total)
    
        root = Tk()
        Button(root, text='Add it up', command=handle_click).pack()
        label = Label(root)
        label.pack()
        root.mainloop()
    

    When you click the button, the whole app will freeze up for a few seconds as Python calculates that sum. So, what we want to do is break it up into chunks:

        def handle_click():
            total = 0
            i = 0
            def callback():
                nonlocal i, total
                total += sum(range(i*1000000, (i+1)*1000000))
                i += 1
                if i == 100:
                    label.config(text=total)
                else:
                    root.after_idle(callback)
            root.after_idle(callback)

    Callback Hell

    While callbacks definitely work, there are a lot of probems with them.

    First, we've turned out control flow inside-out. Compare the simple for loop to the chain of callbacks that replaced it. And it gets much worse when you have more complicated code.

    On top of that, it's very easy to get lost in a callback chain. If you forget to return from a sequential function, you'll just fall off the end of the function and return None. If you forget to schedule the next callback, the operation never finishes.

    It's also hard to propagate results through a chain of callbacks, and even harder to propagate errors. Imagine that one callback needs to schedule a second which needs to schedule a third and so on--but if there's an exception anywhere on the chain, you want to jump all the way to the last callback. Think about how you'd write that (or just look at half the Javascript apps on the internet, which you can View Source for), and how easy it would be to get it wrong.

    And debugging callback-based code is also no fun, because the stack traceback doesn't show you the function that scheduled you to run later, it only shows you the event loop.

    There are solutions to these problems, which I'll cover in another post. But it's worth writing an app or two around explicit callbacks, and dealing with all the problems, so you can understand what's really involved in event-loop programming.

    Blocking operations

    Sleeping isn't the only thing that blocks. Imagine that you wanted to read a large file off the disk, or request a URL over the internet. How would you do that with callbacks?

    We had to replace our sleep with a call to after, passing it the rest of our function as a callback. Similarly, we have to replace our read or urlopen with a call to some function that kicks off the work and then calls our callback when it's done. But most GUI frameworks don't have such functions. And you don't want to try to build something like that yourself.

    I/O isn't the only kind of blocking, but it's by far the most common. And there's a nice solution to blocking I/O: asynchronous I/O, using a networking framework. Whether this is as simple as a loop around select or as fancy as Twisted, the basic idea is the same as with a GUI: it's an event loop that you add handlers to.

    And there's the problem: your GUI loop and your I/O loop both expect to take over the thread, but they obviously can't both d that.

    The solution is to make one loop drive the other. If either framework has a way to run one iteration of the main loop manually, instead of just running forever, you can, with a bit of care, put one in charge of the other. (Even if your framework doesn't have a way to do that, it may have a way to fake it by running an event loop and immediately posting a quit event; Tkinter can handle that.)

    And the work may have already been done for you. Twisted is a networking framework that can work with most popular GUI frameworks. Qt is a GUI framework with a (somewhat limited) built-in network framework. They both have pretty high learning curves compared to Tkinter, but it's probably easier to learn one of them than to try to integrate, say, Tkinter and a custom select reactor yourself.

    Another option is a hybrid approach: Do your GUI stuff in the main thread, and your I/O in a second thread. Both of them can still be callback-driven, and you can localize all of the threading problems to the handful of places where the two have to interact with each other.

    Threading

    With multithreading, we don't have to reorganize our code at all, we just move all of the work onto a thread:

        def handle_click():
            def callback():
                total = sum(100000000)
                print(total)
            t = threading.Thread(target=callback)
            t.start()
    

    This kicks off the work in a background thread, which won't interfere with the main thread, and then returns immediately. And, not only is it simpler, you don't have to try to guess how finely to break up your tasks; the OS thread scheduler just magically takes care of it for you. So all is good.

    Plus, this works just as well for I/O as it does for computation (better, in fact):

        def handle_click():
            def callback():
                r = urlopen('http://example.com')
                data = r.read()
                soup = BeautifulSoup(data)
                print(soup.find('p').text)
            t = threading.Thread(target=callback)
            t.start()
    

    But what if we want it to interfere with the main thread? Then we have a problem. And with most frameworks--including Tkinter--calling any method on any GUI widget interferes with the main thread. For example, what we really wanted to do was this:

        def handle_click():
            def callback():
                total = sum(100000000)
                label.config(text=total)
            t = threading.Thread(target=callback)
            t.start()
    

    But if we try that, it no longer works. (Or, worse, depending on your platform/version, it often works but occasionally crashes...)

    So, we need some way to let the background thread work with the GUI.

    on_main_thread

    If you had a function on_main_thread that could be called on any thread, with any function, and get it to run on the main thread as soon as possible, this would be easy to solve:

        def handle_click():
            def callback():
                total = sum(100000000)
                root.on_main_thread(lambda: label.config(text=total))
            t = threading.Thread(target=callback)
            t.start()
    

    Many GUI frameworks do have such a function. Tkinter, unfortunately, does not.

    If you want to, you can pretty easily wrap up all of your widgets with proxy objects that forward method calls through on_main_thread, like this:

        class ThreadedMixin:
            main_thread = current_thread()
            def _forward(self, func, *args, **kwargs):
                if current_thread() != ThreadedMixin.main_thread:
                    self.on_main_thread(lambda: func(*args, **kwargs))
                else:
                    func(*args, **kwargs)
    
        class ThreadSafeLabel(Label, ThreadedMixin):
            def config(self, *args, **kwargs):
                self._forward(super().config, args, kwargs)
            # And so on for the other methods
    

    Obviously you'd want do this programmatically or dynamically instead of writing hundreds of lines of forwarding code.

    post_event

    If you had a function post_event that could be called on any thread to post a custom event to the event queue, you could get the same effect with just a bit of extra work--just write an event handler for that custom event. For example:

        def handle_my_custom_event(event):
            label.config(text=event.message)
        root.register_custom_event('<My Custom Event>')
        root.bind('<My Custom Event>', handle_custom_event)
    
        def handle_click():
            def callback():
                total = sum(100000000)
                event = Event('<My Custom Event>', data=total)
                root.post_event(event)
    

    Most GUI frameworks that don't have on_main_thread have post_event. But Tkinter doesn't even have that.

    Polling queues

    With limited frameworks like Tkinter, the only workaround is to use a Queue, and make Tkinter check the queue every so often, something like this:

        q = queue.Queue()
    
        def on_main_thread(func):
            q.put(func)
    
        def check_queue():
            while True:
                try:
                    task = q.get(block=False)
                except Empty:
                    break
                else:
                    root.after_idle(task)
            root.after(100, check_queue)
    
        root.after(100, check_queue)
    

    While this works, it makes the computer waste effort constantly checking the queue for work to do. This isn't likely to slow things down when your program is busy--but it will make it drain your battery and prevent your computer from going to sleep even when your program has nothing to do. Programs that use a mechanism like this will probably want some way to turn check_queue on and off, so it's only wasting time when you actually have some background work going.

    mtTkinter

    There's a wrapper around Tkinter called mtTkinter that effectively builds on_main_thread out of something like check_queue, and then builds thread-safe proxies around all of the Tkinter widgets, so you can use Tkinter as if it were completely thread-safe.

    I don't know whether it's really "production-quality". I believe it hasn't been ported to Python 3 either. (2to3 might be enough, but I can't promise that.) And the LGPL licensing may be too restrictive for some projects. But for learning purposes, and maybe for building simple GUIs for your own use, it's worth looking at.

    I have a quick&dirty port to 3.x on GitHub if you want to try it.

    Threading limits

    Unlike callbacks, if you pile up too many threads, you start adding additional overhead, in both time and space, on top of the cost of actually doing the work.

    The solution to this is to use a small pool of threads to service a queue of tasks. The easiest way to do this is with the futures module:

        executor = ThreadPoolExecutor(8)
    
        def handle_click():
            def callback():
                total = sum(100000000)
                root.on_main_thread(lambda: label.config(text=total))
            executor.submit(callback)
    

    Shared data

    The biggest problem with threads is that any shared data needs to be synchronized, or you have race conditions. The general problem, and the solutions, are covered well all over the net.

    But GUI apps add an additional problem: Your main thread can't block on a synchronization object that could be held for more than a fraction of a second, or your whole GUI freezes up. So, you need to make sure you never wait on a sync object for more than a brief time (either by making sure nobody else can hold the object for too long, or by using timeouts and retries).
    21

    View comments

  10. So you've installed Python from an official binary installer on python.org's Releases page, you've installed Xcode from the App Store and the Command Line Tools from Xcode, you've installed pip from its setup script. And now, you try to "pip-X.Y install pyobjc" and it fails with a whole slew of obscure error messages. 

    An easy workaround: Don't

    The official binary installer seems like the easy way to do things, but it's not. It's built to work with every version of OS X from 10.6 to 10.9. This means whenever you build a package, it will try to build that package to work with every version of OS X from 10.6 to 10.9. This is very hard to do—especially on a 10.8 or 10.9 machine with Xcode 5.

    If you're planning to build applications for binary distribution with, e.g., py2app, and you want them to work on an older version of OS X than you have, then you need to get this working. (Although even then, you might be better off building Python exactly the way you want, instead of using the binary installation.) So far, I haven't been able to get this working with Xcode 5; I've been using an old machine that I don't update.

    For almost everyone else, it's unnecessary wasted effort.

    Python 2

    If you're using Python 2, just stick with Apple's pre-installed 2.7.2. Having multiple 2.7 installations at the same time is already a huge headache, and the added problems with building packages… is it really worth it?

    Python 3

    While it may seem counter-intuitive, building it yourself makes everything easier, because you end up with a Python installation tailored to your build toolchain, not to the Python Mac build machine's toolchain.

    And if you use Homebrew, building it yourself is just "brew install python3". Plus, you get setuptools and pip (that work with your system), and a newer sqlite3, real readline, gdbm, and a few other things you wouldn't have thought of.

    When are they going to fix it?

    I know that the pythonmac SIG are aware of the problem. In fact, the problem has been around for a long time; it's just that the workarounds they've used since 10.6 no longer work. I have no idea what they're planning to do about it. You might want to watch the pythonmac-sig mailing list for progress, or join in to help.

    The problem

    There are actually two problems.

    gcc is gone

    The official Python.org binaries are built with Apple's custom gcc-4.2, as supplied by Xcode 3.2.

    Xcode 4 stopped supplying gcc-4.2, but offered a transitional compiler called llvm-gcc-4.2 (because it used a custom gcc-4.2 frontend hooked up to the llvm backend), and the toolchain came with wrappers named things like "gcc-4.2" and "g++-4.2" and so on. This actually had some problems building Python itself, but for building extension modules—even complex ones like numpy and pyobjc—you usually got away with it.

    Xcode 5 dropped llvm-gcc-4.2 as well. Now, all you've got is clang. And, while "gcc" is a wrapper around clang, "gcc-4.2" does not exist at all. So, many extensions will just fail to build, because they're looking for a compiler named "gcc-4.2" (or a C++ compiler named "g++-4.2", or a linker frontend named "gcc-4.2", or…). The new compiler—which Apple calls "Apple LLVM 5.0 (clang-500.2.76) (based on LLVM 3.3svn)", just to make it impossible for anyone to refer to—does a much better job than llvm-gcc-4.2; if you can just get distutils to use it everywhere, everything pretty much just works.

    In some cases, just passing "CC=clang CXX=clang++" environment variables to the build will work. You can get further by also adding "MAINCC=clang LINKCC=clang". Anything that needs to run a configure script will _still_ end up picking up gcc-4.2, however, and there may be similar issues with projects that first build a local distutils.cfg or similar.

    One workaround is to edit  /Library/Frameworks/Python.framework/Versions/X.Y/lib/pythonX.Y/config-X.Ym/Makefile to fix all references to gcc and g++ to instead reference clang and clang++, then cross your fingers. This seems to work.

    Alternatively, you could create a symlink, or a hardlink, from /usr/bin/gcc to /usr/local/bin/gcc-4.2, and likewise for g++, and cross your fingers even tighter. I haven't tried this.

    10.8 is the oldest SDK

    We've always been at war with Eastasia, and we've always been compiling for 10.8. There has never been an older SDK. References in your configure scripts to MacOS10.6.sdk are errors.

    Many extensions will build just fine without the 10.6 SDK—but they'll quietly build for your native system, which defeats the purpose of building a redistributable application.

    You can still find the 10.6 and 10.7 SDKs in older Xcode packages from Apple (and, for 10.7, you can download the latest Command Line Tools for Lion, which is just the SDK slightly repackaged). Then you can copy them into /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/ and… whether they'll actually work, I don't know. They won't have an SDKSettings.plist file. They won't be registered in the list of known SDKs; the GUI and xcodebuild certainly won't find them, but maybe specifying them on the command line will work. Or maybe only if you use absolute paths.

    0

    Add a comment

Blog Archive
About Me
About Me
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.