1. There are a lot of questions on StackOverflow asking "what's the deal with self?"

    Many of them are asking a language-design question: Why does Python require explicit self when other languages like C++ and friends (including Java), JavaScript, etc. do not? Guido has answered that question many times, most recently in Why explicit self has to stay.

    But some people are asking a more practical question: Coming from a different language (usually Java), they don't know how to use self properly. Unfortunately, most of those questions end up getting interpreted as the language-design question because the poster is either a StackOverflow novice who never figures out how to answer comments on his question (or just never comes back to look for them) or a programming novice who never figures out how to ask his question.

    So I'll answer it here.

    tl;dr

    The short version is very simple, so let's start with that:

    • When defining a method, always include an extra first parameter, "self".
      • Yes, that's different from Java, C++, JavaScript, etc., where no such parameter is declared.
    • Inside a method definition, always access attributes (that is, members of the instance or its class) with dot syntax, as in "self.spam".
      • Yes, that's different from C++ and friends, where sometimes "spam" means "this->spam" (or "this.spam", in some languages) as long as it's not ambiguous. In Python, it never means "self.spam".
    • When calling a method, on an object as in "breakfast.eat(9)", the object is passed as the first ("self") argument.
      • Yes, that's different from C++ and friends, where instead of being passed normally as a first argument it's hidden under the covers and accessible through a magic "this" keyword.

    Exceptions to the rules

    Most of these should never affect novices, but it's worth putting them all in one place, from most common to least:
    • When calling a method on the class itself, instead of on one of its instances, as in "Meal.eat(breakfast, 9)", you have to explicitly pass an instance as the first ("self") argument ("breakfast" in the example).
    • A bound method, like "breakfast.eat", can be stored, passed around, and called just like a function. Whenever it's eventually called, "breakfast" will still be passed as the first "self" argument.
    • An unbound method, like "Meal.eat", can also be stored, passed around, and called just like a function. In fact, in 3.0+, it is just a plain old function. Whenever it's eventually called, you still need to pass an instance explicitly as the first "self" argument.
    • @classmethods take a "cls" parameter instead. Whether you call these on the class itself, or on an instance of the class, the class itself gets passed as the first ("cls") argument. These are often used for "alternate constructors", like datetime.now().
    • @staticmethods do not take any extra parameter. Whether you call these on the class itself, or on an instance, nothing gets passed as an extra argument. These are almost never used.
    • __new__ is always a @staticmethod, even though you don't declare it that way, and even though it actually acts more like a @classmethod.
    • If you need to create a bound method explicitly for some reason (e.g., to monkeypatch an object without monkeypatching its type), you need to construct one manually using types.MethodType(func, obj, type(obj)).

    What does a name name?

    In a language like C++ or Java, when you type a variable name all by itself, it can mean one of many different things. At compile time, the compiler checks for a variety of different kinds of things that could be declared, in order, and picks the first matching declaration it finds. The rules are something like this (although not exactly the same in every related language): 
    1. If you declared a variable with that name in the function definition, it's a local variable. (Actually, it's a bit more complicated, because every block in a C-family language is its own scope, but let's ignore that.)
    2. If you declared a static variable with that name in the function definition, it's a static variable. (These are globals in disguise, except that they don't conflict with other globals of the same name defined elsewhere.)
    3. If you declared a parameter with that name in the function definition, it's a function parameter. (These are basically the same as local variables.)
    4. If you declared a variable with that name in the function that the current local function is defined inside, it's a closure ("non-local") variable.
    5. If you declared a member with that name in the class definition, it's an instance variable. (Each instance has its own copy of this variable.)
    6. If you declared a class member with that name in the class definition, it's a class variable. (All instances of the class share a single copy of this variable, but each subclass has a different single copy for all of its instances.)
    7. If you declared a static member with that name in the class definition, it's a static class variable. (All instances of all subclasses share a single copy of this variable—it's basically a global variable in disguise.)
    8. Otherwise, it's a global variable.
    If you use dot-syntax with "this", like "this.spam" (in C++, "this->spam"), or with the class, like "Meat.spam" (in C++, "Meat::spam"), you can avoid those rules and unambiguously specify the thing you want—even if there's a local variable named "spam", "this.spam" is still the instance variable.

    Python doesn't have declarations. This means you don't have to write "var spam" all over the place as you do in JavaScript (or "const char * (Eggs::*Foo)(const char *)" as in C++) just to create local variables. Even better, you don't need to declare your class's instance variables anywhere; just create them in __init__, or any other time you want to, and they're members.

    That gets rid of a whole lot of useless boilerplate in the code that doesn't provide much benefit. But it does mean you lose what little benefit that boilerplate would have provided—in particular, the compiler can't tell whether you wanted a global variable or an instance variable named "spam", because there's nowhere to check whether such an instance variable exists.

    Therefore, when you're used to having a choice between "spam" and "this.spam", in Python you always have to write "self.spam". (And when you have a choice between "spam" and "Meat.spam", possibly with "this.spam" as an option, in Python you always have to write "Meat.spam", possibly with "this.spam" as an option.) 

    If you want to know why, you really need to read the article linked above. But briefly, besides the fact that the advantages of eliminating declarations are much bigger than the costs, "Explicit is better than implicit."

    So, what rules does Python use for deciding what kind of variable a name names?
    1. If you list the variable name in a global statement, it's a global variable.
    2. If you list the variable name in a nonlocal statement (3.0+ only), it's a closure variable.
    3. If you assign to the variable name somewhere in the current function, it's local.
    4. If the variable name is included in the parameter list for the function definition, it's a parameter (meaning it's local).
    5. If the same name is a local or closure variable in the function the current function was defined inside, it's a closure variable.
    6. Otherwise, it's global.
    Notice that, while a few of these (1, 2, and 4) are kind of similar to "declarations", the others (3, 5, and 6) are not at all the same. The fact that you don't accidentally get global variables when you forget to declare things (as in JavaScript) is another advantage of Python's way of doing things.
    1

    View comments

  2. Tkinter makes slapping together a simple GUI very easy. But unfortunately, many of its features aren't very well documented. The only way to figure out how to do something is often to figure out what Tk objects it's using, look up the Tcl/Tk documentation for those objects, then try to work out how to access them through the Tkinter layer.

    One of the places this most often comes up is validating Entry boxes. Novices expect that Entry must have some kind of <Change> event, and if they could just find the right name and bind it, they could handle validation there.

    The first obvious thought is to bind <KeyDown> or <KeyUp>. But that doesn't work, because some key presses don't change anything (like Tab), and there are ways to change the contents without typing anything (like pasting or dragging). If you work hard enough, you can find all the right events to bind and filter out the right cases and get the equivalent of a <Change> event…

    But after doing so, you can't do anything useful in the event handler! All of these events get fired before the contents of the Entry have been changed. So all you can validate is whatever used to be there, which isn't very helpful.

    There is a way around this, but it's very clunky: your event handler, can set up the real handler with to run the next time through the event loop, via after_idle. In that real handler when you access the contents of the Entry, you're getting the new contents.

    Surely there must be a better way.

    And there is. In fact, two different ways. And I'll explain both, with links to the Tcl/Tk docs. Hopefully, after reading this, you'll not only know how to validate Entry boxes, but also how to figure out how to do things in Tkinter that aren't explained anywhere.

    But first, make sure you've read the first few sections of the Tkinter docs, at least up to the section called Mapping Basic Tk into Tkinter, but ideally the whole chapter.

    Example Program

    Let's create a dead-simple stupid program (using Python 3; just change the "tkinter" to "Tkinter" and it will work in Python 2):
        from tkinter import *
    
        class MyFrame(Frame):
            def __init__(self, parent):
                Frame.__init__(self, parent)
                self.text = Label(self, text='Name')
                self.text.pack()
                self.name = Entry(self)
                self.name.pack()
                self.name.focus_set()
                self.submit = Button(self, text='Submit', width=10, 
                                     command=self.callback)
                self.submit.pack()
                self.entered = Label(self, text='You entered: ')
                self.entered.pack()
    
            def callback(self):
                self.entered.config(text='You entered: ' + self.name.get())
                self.name.delete(0, END)
    
        root = Tk()
        frame = MyFrame(root)
        frame.pack()
        root.mainloop()
    
    Now, we want to validate this in the simplest way possible: when the Entry is empty, the Button should be disabled.

    Validation

    Tk Entry boxes have a validation feature. Unfortunately, the Tkinter docs only mention this off-hand in one place, and give no more information than that they "support validate and validatecommand". Unless you were already a Tcl/Tk expert, you'd have know idea what this means. But at least you can Google for "Tk entry validatecommand", which should get you to the docs here.

    Reading those docs, we want our command to get called whenever the Entry is edited, which means we want to set the "validate" value to "key". That's easy.

    We also want our command to get called with the updated value of the Entry, so that we can tell whether it's empty or not. For this, we want to use the substitution "%P". But how do we do that?

    This is the tricky part that you won't find anywhere in the docs. The Tcl/Tk docs say to do it "just as you would in a bind script", but in a Python/Tkinter event binding you get pass a callable, and it gets called with some arguments that are specified by Tkinter. That doesn't work here.

    Instead, you have to manually do what bind does for you under the covers: What you actually pass is a tuple of function ID and one or more argument strings. To get that function ID, you tell Tkinter to register your callable and return an ID. Then, when Tkinter tries to call the function by ID, Tkinter looks up your registered callable and calls it, with arguments matching your string argument spec.

    Your validate method can do whatever it wants, but at the end it has to return True to allow the change, False to reject it, or None to disable itself (so the Entry is no longer validated). If you return False to reject the change, the Entry contents will not be changed (just as if the user hadn't typed/pasted/whatever anything). And, if you've set an invalidcommand, it will get called. Just like the validatecommand, the invalidcommand has to be a function ID and argument strings.

    Sound confusing? Yeah, it is, especially since it's not documented anywhere. But it's not that hard once you get the hang of it.

    First, we create a validate method. It's going to take the "%P" argument, so let's call the parameter "P":
        def validate(self, P):
            self.submit.config(state=(NORMAL if P else DISABLED))
            return True
    
    Now, in our constructor, we have to register that method, and just pass that ID along with the "%P" string as the validatecommand (and "key" as the validate):
        def __init__(self, parent):
            # ...
            vcmd = parent.register(self.validate)
            self.name = Entry(self, validate='key', validatecommand=(vcmd, '%P'))
            self.name.pack()
            # …
    
    One last thing: because the validate method doesn't get called until the Entry changes, you'll want to either start the Button off disabled, or manually call the validate method at the end of the constructor (making sure to pass the appropriate value for the P parameter, of course).

    You can find a complete version of the code at Pastebin.

    Actually, one more one last thing—which doesn't come up very often, but will confuse the hell out of you if it does. If your validatecommand (or invalidcommand) modifies the Entry directly or indirectly (e.g., by calling set on its StringVar), the validation will get disabled as soon as your function returns. (This is how Tk prevents an infinite loop of validate triggering another validate.) You have to turn it back on (by calling config). But you can't do that from inside the function, because it gets disabled after your function returns. So you need to do something like this:
        def validate(self, P):
            if P == 'hi':
                self.name.delete(0, END)
                self.name.insert(0, 'hello')
                self.after_idle(lambda: self.name.config(validate='key'))
                return None
            else:
                return True
    

    Variable tracing

    There's another way to do this, which existed before Tcl/Tk had validation commands.

    Tk lets you attach a variable to an Entry widget; the variable will hold the current contents of the Entry at any time. Of course it has to be a Tcl variable, not a Python variable, but Python/Tkinter lets you create a Tcl variable with StringVar and related classes. (Also see Coupling Widget Variables in the Python docs.)

    So far, that doesn't sound useful. But Tcl has another feature called variable tracing. You can attach an observer callback that gets called whenever a variable is read (accessed), or written (assigned a new value), or unset (deleted). The existence of the trace function is documented in Tkinter, but that's as far as it goes; there's just a big "FIXME: describe the mode argument and how the callback should look, and when it is called." However, another page, called A Validating Entry Widget, serves as an example. It still doesn't document the API, but it happens to show exactly what we want to do with tracing.

    To find out how trace actually works rather than blindly copy-pasting magic code, you have to turn to the Tcl/Tk docs again. And of course that still doesn't tell you how Tkinter maps between Python and Tcl. So, here's the deal:

    To set a trace on a StringVar or other variable, you call its trace method with two arguments: a mode and a callback, just as the docs say. The "r" and "u" modes aren't very useful, but the "w" mode is called whenever the variable is written--which happens every time the Entry you've attached it to changes contents. When your callback is called, it gets three arguments: name1, name2, and mode.

    Together, name1 and name2 provide the Tcl name of the variable. For an array or other collection, name1 is the array variable and name2 is the index into the array (as a string, like "2"—everything in Tcl is a string). For a scalar, like a string or integer, name1 is the scalar variable and name2 is an empty string. Since Tkinter doesn't make it easy to create and wrap Tcl arrays, name2 will always be empty. But what's name1? Tkinter creates Tcl variables dynamically, giving them names like PY_VAR0. You can find the name of the Tcl variable underlying any StringVar as its _name attribute. So, if you have 10 identical Entry boxes, and you want to do the same code when any of them change, but be able to tell which one it is, you can use name1 for that. That being said, it's a lot easier to just create 10 separate closures around the same function in Python and not bother with the _name nonsense. So, you'll rarely use these arguments either.

    And that's why the same just declares the callback with *dummy for the parameters.

    Unlike a validation function, the trace function can't interfere with what's happening. In fact, by the time your function gets called, the Entry has already been modified and the variable has already been updated. If you want to reject (or modify) the change, you have to do that manually, rather than just returning False.

    As before, first we'll write our validate method:
        def validate(self, name, index, mode): # or just self, *dummy
            self.submit.config(state=(NORMAL if self.namevar.get() else DISABLED))
    

    And now, we'll hook it up:
        def __init__(self, parent):
            # ...
            self.namevar = StringVar()
            self.namevar.trace('w', self.validate)
            self.name = Entry(self, textvariable=self.namevar)
            self.name.pack()
            # …
    
    And that's all there is to it. Again, complete code is at Pastebin.

    You may be wondering about performance. Isn't attaching debugging hooks ridiculously slow? Maybe, but who cares if you waste hundreds of microseconds on every user input, when user inputs take at least 1000x as long as that? If you're really worried about that kind of thing, you shouldn't be using Tkinter—everything it does is Tcl under the hood, and everything Tcl does is building and evaluating string commands in the most wasteful way you can possibly imagine (and that's before you even add all the Tkinter Tcl<->Python bridging on top of it). If your GUI is responsive enough (which it almost certainly is), adding variable tracing will not change that.

    Deciding which one to use

    In Python, there should be only one obvious way to do it. But here, there are two ways to do it. Which one is the obvious one?

    The key thing to note here is that they're not the same thing. They only overlap in functionality for the simplest use cases:

    • Validation gets called before the modification goes through, and you can reject the change. Tracing gets called after the modification goes through.
    • Validation can take a wide variety of parameters, including things like the index into the string at which the change happened; tracing takes no useful parameters (although you can easily fetch the new value from the variable itself).
    • Validation can hook things like focusout instead of key (meaning that instead of constantly telling the user "that's not a valid phone number" after each character until he's finally done, you can let him type whatever he wants and then check it after he tabs to the next field). Tracing hooks all changes, no matter what user event or code triggered them.
    • Validation doesn't require a StringVar, and generally can't take advantage of one usefully; tracing obviously does and can.
    • Validation is intended for validating user input; tracing is intended for debugging.
    Basically, if you already wanted a StringVar, tracing is often a good idea; even if you didn't, in simple cases, tracing is simpler. But in general, validation is the right thing.

    Other widgets

    There are other widgets that can take text entry besides Entry. What if you wanted to validate one of them?

    Some of them handle validatecommand, or textvariable (or the same thing under the name "variable" or "valid"), or both. This often isn't documented. You can always try adding the extra keyword arguments—if the widget doesn't handle validatecommand, it'll tell you that with a pretty simple exception. That works great for, say, the ttk.Entry widget, which (as you'd hope) works as a drop-in replacement for the stock Entry. 

    In some other cases, you can access the real Tk widgets under the Python or Tcl wrappers—e.g., if you've installed itk and the Tkinter wrappers around it, itk.EntryField effectively has a ttk.Entry underneath it.

    And then there's Text.

    Text

    Text (and things like ScrolledText that wrap or emulate it) doesn't anything like Entry. There's no validatecommand, textvariable, variable, value… And if you look through the Tk docs for Text, there's nothing that looks even remotely useful from the Tk side.

    Well, there's a reason that Text doesn't do validation—it's meant to be a (possibly rich) text editor, not a simple multi-line Entry field. But unfortunately, as long as Tkinter doesn't come with a simple multi-line Entry field, people are going to use Text. In fact, even in Tcl/Tk, people often use Text as a multi-line Entry field, which is why people have come up with different ways to extend it with validation and/or textvariable, as shown on the Tk wiki. There just is no way to validate Text widgets cleanly, but sometimes there's nothing else to use by Text widgets.

    If you were using Tcl, it's not at all hard to extend or wrap the Tk text command (see the wiki link above). But if you're using Python, you have to do that in Tcl/Tk, and then wrap the resulting command in Python/Tkinter. Almost nobody who uses Python wants to learn Tcl and the internal guts of Tkinter. If that's you, this is simply not an option.

    If you weren't using Tk, all of the other major cross-platform widget frameworks with Python bindings (Qt, wxWidgets, Gtk+, …) and platform-specific GUI bindings (PyWin32, PyObjC, …) have ways to write multi-line text controls with some way to validate them. Sure, they all have a higher learning curve (and none of them come pre-installed with Python), but if you're banging your head against the wall trying to make Tkinter do things that it can't do, you're probably wasting more effort than you're saving.

    If you insist on staying with pure Python/Tkinter, you just have to accept its limitations. And that may be fine. Go back to the hack mentioned in the introduction to this post—if you bind all the relevant events, you can get a handler called before the change goes through, and use after_idle in that handler to get a second handler called after the change goes through, and… is that good enough for your GUI? If so, do it.
    7

    View comments

  3. In Python 2.x Tkinter code, you see a lot of stuff like this:
        class MyFrame(Frame):
            def __init__(self, parent, n):
                Frame.__init__(self, parent)
                self.n = n
    
    Why?

    Inheritance and overriding

    Some people start on Tkinter before getting far enough into learning Python. You should definitely read the Classes chapter in the Python tutorial, but I'll summarize the basics.

    First, you're subclassing (inheriting from) the Tkinter class Frame. This means instances of your class are also instances of Frame, and can be used like Frames, and can use the internal behavior of Frame.

    When someone constructs a MyFrame object by typing MyFrame(root, 3), that creates a new MyFrame instance and calls new_instance.__init__(root, 3).

    You've overridden the parent's __init__ method with your own, so that's what gets called.

    But if you want to act like a Frame, you need to make sure all the stuff that gets done in constructing a Frame object also gets done in constructing your object. In particular, whatever Tkinter does under the covers to create a new widget in the window manager, connect it up to its parent, etc. all has to get done. So, you need Frame.__init__ to get called for your object to work.

    Unlike some other languages, Python doesn't automatically call base class constructors for you; you have to do it explicitly. The advantage of this is that you get to choose which arguments to pass to the parent--which is handy in cases (like this example) where you want to take extra arguments.

    Not so super

    The normal way to do this is to use the super function, like this:
        class MyFrame(Frame):
            def __init__(self, parent, n):
                super(MyFrame, self).__init__(parent)
                self.n = n
    
    Unfortunately, that doesn't work with Tkinter, because Tkinter uses classic classes instead of new-style classes. So, you have to do this clunky thing instead where you call the method on the class instead of calling it like a normal method.

    You don't really want to learn all about classic classes, because they're an obsolete technology. (In Python 3, they no longer exist.) You just need to know two things:
    • Never use classic classes when you can help it. If you don't have anything to put as your base class, use object.
    • When you can't help it and are forced to use classic classes (as with Tkinter), you can't use super, so you have to call methods directly on the class.
    But how does the clunky thing work?

    Unbound methods

    If you want all the gory details, see How Methods Work. But I'll give a short version here.

    Normally, in Python—like most other object-oriented languages—you call methods on objects. The way this works is a bit surprising: foo.bar(spam) actually constructs a "bound method" object foo.bar, then calls it like a function, with foo and spam as the arguments. That foo then becomes the self parameter that you have to put in every method definition.

    Since classes themselves are just another kind of objects, you can call methods on them too, like FooType.bar(spam). But here, Python doesn't have any bound object to get passed as your self parameter—it constructs an "unbound method" FooType.bar, then calls it with just spam as an argument, so there's nothing to match up with your self parameter. (Python could have been designed to pass the FooType class itself as the self parameter, but that would be confusing more often than it would be useful. When you want that behavior—for example, to create "alternate constructors" like datetime.datetime.now, you have to ask for it explicitly, with the @classmethod decorator.) So, you have to pass it yourself.

    In other words, in this code, the two method calls at the end are identical:
        class FooType(object):
            def bar(self, spam):
                print self, spam
        foo = FooType()
        foo.bar(2)
        FooType.bar(foo, 2)
    
    So, why would you ever use the clumsy and verbose second form? Basically, only when you have to. Maybe you want to pass the unbound method around to call on an object that will be created later. Maybe you had to look up the method dynamically. Or maybe you've got a classic class, and you're trying to call a method on your base class.
    1

    View comments

Blog Archive
About Me
About Me
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.