1. There are hundreds of questions on StackOverflow that all ask variations of the same thing. Paraphrasing:
    lst is a list of strings and numbers. I want to convert the numbers to int but leave the strings alone. How do I do that?
    This immediately gets a half-dozen answers that all do some equivalent of:
        lst = [int(x) if x.isdigit() else x for x in lst]
    
    This has a number of problems, but they all come down to the same two:
    • "Numbers" is vague. You can assume it means only integers based on "I want to convert the numbers to int", but does it mean Python integer literals, things that can be converted with the int function with no base, or things that can be converted with the int function with base=0, or something different entirely, like JSON numbers or Excel numbers or the kinds of input you expect your 3rd-grade class to enter?
    • Whichever meaning you actually wanted, isdigit() does not test for that.
    The right answer depends on what "numbers" actually means.

    If it means "things that can be converted with the int function with no base", the right answer—as usual in Python—is to just try to convert with the int function:
        def tryint(x):
            try:
                return int(x)
            except ValueError:
                return x
        lst = [tryint(x) for x in lst]
    
    Of course if you mean something different, that's not the right answer. Even "valid integer literals in Python source" isn't the same rule. (For example, 099 is an invalid literal in both 2.x and 3.x, and 012 is valid in 2.x but probably not what you wanted, but int('099') and int('0123') gives 99 and 123.) That's why you have to actually decide on a rule that you want to apply; otherwise, you're just assuming that all reasonable rules are the same, which is a patently false assumption. If your rule isn't actually "things that can be converted with the int function with no base, then the isdigit check is wrong, and the int(x) conversion is also wrong.

    What specifically is wrong with isdigit?

    I'm going to assume that you already thought through what you meant by "number", and the decision was "things that can be converted to int with the int function with no base", and you're just looking for how to LBYL that so you don't have to use a try.

    Negative numbers

    Obviously, -234 is an integer, but just as obviously, "-234".isdigit() is clearly going to be false, because - is not a digit.

    Sometimes people try to solve this by writing all(c.isdigit() or c == '-' for c in x). But, besides being a whole lot slower and more complicated, that's even more wrong. It means that 123-456 now looks like an integer, so you're going to pass it to int without a try, and you're going to get a ValueError from your comprehension.

    Of course you can solve that problem with (x[0].isdigit() or x[0] == '-') and x[1:].isdigit(), and now maybe every test you've thought of passes. But it will give you "1" instead of converting that to an integer, and it will raise an IndexError for an empty string.

    One of these might be correct for handling negative integer numerals:
        x.isdigit() or x.startswith('-') and x[1:].isdigit()
        re.match(r'-?\d+', x)?
    
    But is it obvious that either one is correct? The whole reason you wanted to use isdigit is to have something simple, obviously right, and fast, and you already no longer have that. And we're not even nearly done yet.

    Positive numbers

    +234 is an integer too. And int will treat it as one. But the code above won't. So now, whatever you did for -, you have to do the same thing for +. WHich is pretty ugly if you're using the non-regex solution:
        lst = [int(x) if x.isdigit() or x.startswith(('-', '+')) and x[1:].isdigit() else x
               for x in lst]
    

    Whitespace

    The int function allows the numeral to be surrounded by whitespace. But isdigit does not. So, now you have to add .strip() before the isdigit() call. Except we don't just have one isdigit call; to fix the other problems we've had two go with two isdigit calls and a startswith, and surely you don't want to call strip three times. Or we've switched to a regex. Either way, now we've got:
        lst = [int(x) if x.isdigit() or x.startswith(('-', '+')) and x[1:].isdigit() else x
               for x in (x.strip() for x in lst)]
        lst = [int(x) if re.match('\s*[+-]?\d+\s*', x) else x for x in lst]
    

    What's a digit?

    The isdigit function tests for characters that are in the Number, Decimal Digit category. In Python 3.x, that's the same rule the int function uses.

    But 2.x doesn't use the same rule. If you're using a unicode, it's not entirely clear what int accepts, but it's not all Unicode digits, at least not in all Python 2.x implementations and versions; if you're using a str encoded in your default encoding, int still accepts the same set of digits, but isdigit only checks ASCII digits.

    Plus, if you're using either 2.x or 3.0-3.2, and you've got a "narrow" Python build (like the default builds for Windows from python.org), isdigit is actually checking each UTF-16 code point, not each character, so for "\N{MATHEMATICAL SANS-SERIF DIGIT ZERO}", isdigit will return False, but int should accept it.

    So, if your user types in an Arabic number like ١٠٤, the isdigit check may mean you end up with "١٠٤", or it may mean you end up with the int 104, or it may be one on some platforms and the other on other platforms.

    I can't even think of any way to LBYL around this problem except to just say that your code requires 3.3+.

    Have I thought of everything?

    I don't know. Do you know? If you don't how are you going to write code that handles the things we haven't thought of.

    Other rules might be even more complicated than the int with no base rule. For different use cases, users might reasonably expect 0x1234 or 1e10 or 1.0 or 1+0j or who knows what else to count as integers. The way to test for whatever it is you want to test for is still simple: write a conversion function for that, and see if it fails. Trying to LBYL it means that you have to write most of the same logic twice. Or, if you're relying on int or literal_eval or whatever to provide some or all of that logic, you have to duplicate its logic.
    2

    View comments

  2. In Haskell, you can section infix operators. This is a simple form of partial evaluation. Using Python syntax, the following are equivalent:
        (2*)
        lambda x: 2*x
    
        (*2)
        lambda x: x*2
    
        (*)
        lambda x, y: x*y
    
    So, can we do the same in Python?

    Grammar

    The first form, (2*), is unambiguous. There is no place in Python where an operator can be legally followed by a close-paren. That works for every binary operator, including boolean and comparison operators. But the grammar is a bit tricky. Without lookahead, how do you make sure that a '(' followed by an expr followed by a binary operator followed by a ')' doesn't start parsing as just a parenthesized expression? I couldn't find a way to make this ambiguous without manually hacking up ast.c. (If you're willing to do that, it's not hard, but you shouldn't be willing to do that.)

    The second form, (*2) looks a lot simpler--no lookahead problems. But consider (-2). That's already legal Python! So, does that mean you can't section the + and - operators?

    The third form, (*) is the simplest. But it's really tempting to want to be able to do the same with unary operators. Why shouldn't I be able to pass (~) or (not) around as a function instead of having to use operator.not_ and operator.neg? And of course that brings us right back to the problem with + and - being ambiguous. (Plus, it makes the compile step a little harder. But that's not a huge deal.)

    I solved these problems using a horrible hack: sectioned operators are enclosed in parens and colons. This looks hideous, but it did let me get things building so I can play with the idea. Now there's no lookahead needed—a colon inside parens isn't valid for anything else (unless you want to be compatible with my bare lambda hack…). And to resolve the +/- issue, only the binary operators can be sectioned, which also means (: -3* :) is a SyntaxError instead of meaning lambda x: -3 * x. Ick. But, again, it's good enough to play with it.

    The key grammar change looks like this:
        atom: ('(' [yield_expr|testlist_comp] ')' |
               '(' ':' sectionable_unop ':' ')' |
               '(' ':' sectionable_binop ':' ')' |
               '(' ':' expr sectionable_binop ':' ')' |
               '(' ':' sectionable_binop expr ':' ')' |
               '[' [testlist_comp] ']' |
               '{' [dictorsetmaker] '}' |
               NAME | NUMBER | STRING+ | '...' | 'None' | 'True' | 'False')
    

    What about precedence?

    Ignored. It only matters if you want to be able to section expressions made up of multiple operators, like (2+3*). Which I don't think you do. For non-trivial cases, there are no readability gains for operator sectioning, and having to think about precedence actually might be a readability cost. If you still don't want to use lambda, do what you'd do in Haskell and compose (2+) with (3*).

    AST

    For the AST, each of those four productions creates a different node type. Except that you _also_ need separate node types for normal binary operators, comparison operators, and boolean operators, because they have different enums for their operators. So I ended up with 10 new types: UnOpSect, BinOpSect, BinOpSectRight, and BinOpSectLeft, CmpOpSect, etc. There's probably a better way to do this.

    Symbol table

    How do you deal with an anonymous argument in the symbol table for the function we're going to generate? You don't want to have to create a whole args structure just to insert a name just so you can refer to it in the compiler. Plus, whatever name you pick could collide with a name in the parent scope, hiding it from a lambda or a comprehension that you define inside the expr. (Why would you ever do that? Who knows, but it's legal.)

    This problem must have already been solved. After all, generator expressions have created hidden functions that don't collide any names in the outer scope since they were first created, and in 3.x all comprehensions do that. It's a little tricky to actually get at these hidden functions, but here's one way to do it:
        >>> def f(): (i for i in [])
        >>> f.__code__.co_consts
        (None, <code object <genexpr> at 0x10bc57a50, file "<stdin>", line 1>, 'z.<locals>.<genexpr>')
        >>> f.__code__.co_consts[1].co_varnames
        ('.0', 'i')
    
    So, the parameter is named .0 which isn't legal in a def or lambda and can't be referenced. Clever. And once you dig into symtable.c, you can see that this is handled in a function named symtable_implicit_arg. So:
            VISIT(st, expr, e->v.BinOpSectLeft.right);
     if (!symtable_enter_block(st, binopsect,
          FunctionBlock, (void *)e, e->lineno,
          e->col_offset))
         VISIT_QUIT(st, 0);
     if (!symtable_implicit_arg(st, 0))
                VISIT_QUIT(st, 0);
            if (!symtable_exit_block(st, (void *)e))
                VISIT_QUIT(st, 0); 
    

    Compiler

    The compilation works similar to lambda. Other than sprintf'ing up a nice name instead of just <lambda>, and the fact that everything is simpler when there's exactly one argument with no defaults and no keywords, everything is the same except the body, which looks like this:
        ADDOP_I_IN_SCOPE(c, LOAD_FAST, 0);
        VISIT_IN_SCOPE(c, expr, e->v.BinOpSectLeft.right);
        ADDOP_IN_SCOPE(c, binop(c, e->v.BinOpSectLeft.op));
        ADDOP_IN_SCOPE(c, RETURN_VALUE);
        co = assemble(c, 1);
    
    I did have to create that ADDOP_I_IN_SCOPE macro, but that's trivial.

    Does it work?

        >>> (: *2 :)
        <function <2> at 0x10bc9f048>
        >>> (: *2 :).__code__.co_varnames
        ('.0',)
        >>> (: *2 :)(23)
        46
    
    As you can see, I screwed up the name a bit.

    More importantly, I screwed up nonlocal references in the symtable. I think I need to visit the argument? Anyway, what happens is this:
        >>> a = 23
        >>> (: *a :)(23)
        Traceback (most recent call last):
          File "<stdin>", line 1, in <module>
          File "<stdin>", line 1, in <a>
        SystemError: no locals when loading 'a'
    
    But that's much better than the segfault I expected. :)

    Is it useful?

    Really, most of the obvious use cases for this are already handled by bound methods, like spam.__add__ instead of (spam+), and the operator module, like operator.add instead of (+). Is that perfect? No:
    • spam.__add__ isn't as flexible as (spam+), because the latter will automatically handle calling its argument's __radd__ when appropriate.
    • Often, you want to section with literals. Especially with integers. But 0.__add__ is ambiguous between a method on an integer literal or a float literal followed by garbage, and therefore a SyntaxError, so you need 0 .__add__ or (0).__add__.
    • For right-sectioning, spam.__radd__ to mean (+spam) isn't so bad, but spam.__gt__ to mean (<spam) is a bit less readable.
    Still, it's hard to find a non-toy example where (<0) is all that useful. Most examples I look at, what I really want is something like lambda x: x.attr < 0. In Haskell I'd probably write that by the rough equivalent of composing operator.attrgetter('attr') with (<0). But, even if you pretend that attribution is an operator (even though it isn't) and add sectioning syntax for it, and you use the @ operator for compose (as was proposed and rejected at least twice during the PEP 465 process and at least once since…), the best you can get is (<0) @ (.attr) which still doesn't look nearly as readable to me in Python as the lambda.

    And, without a compelling use case, I'm not sure it's worth spending more time debugging this, or trying to think of a clever way to make it work without the colons and without lookahead, or coming up with a disambiguating rule for +/-. (It's obviously never going to make it into core…)

    Anything else worth learning here?

    When I was having problems getting the symbol table set up (which I still didn't get right…), I realized there's another way to tackle this: Just stop at the AST, which is the easy part. The result, when run normally, is that any operator-sectioning expression resolves to an empty tuple, which doesn't seem all that useful… but you've got an AST node that you can transform with, say, MacroPy. And converting the meaningless AST node into a valid lambda node in Python is a lot easier to building the symbol table and bytecodes in C. Plus, you don't have to rebuild Python every time you make a change.

    I don't think this is an argument for adding do-nothing AST structures to the core, of course… but as a strategy for hacking on Python, I may start with that next time around.
    1

    View comments

  3. Many people—especially people coming from Java—think that using try/except is "inelegant", or "inefficient". Or, slightly less meaninglessly, they think that "exceptions should only be for errors, not for normal flow control".

    These people are not going to be happy with Python.
    You can try to write Python as if it were Java or C, using Look-Before-You-Leap code instead of Easier-to-Ask-Forgiveness-than-Permission, returning error codes instead of raising exceptions for things that aren't "really" errors, etc. But you're going to end up with non-idiomatic, verbose, and inefficient code that's full of race conditions.

    And you're still going to have exceptions all over the place anyway, you're just hiding them from yourself.

    Hidden exceptions

    Iteration

    Even this simple code has a hidden exception:
        for i in range(10):
            print(i)
    
    Under the covers, it's equivalent to:
        it = iter(range(10))
        while True:
            try:
                i = next(it)
            except StopIteration:
                break
            else:
                print(i)
    
    That's how iterables work in Python. An iterable is something you can call iter on an get an iterator. An iterator is something you can call next on repeatedly and get 0 or more values and then a StopIteration.

    Of course you can try to avoid that by calling the two-argument form of next, which lets you provide a default value instead of getting an exception. But under the covers, next(iterator, default) is basically implemented like this:
        try:
            return next(iterator)
        except StopIteration:
            return default
    
    So, even when you go out of your way try to LBYL, you still end up EAFPing.

    Operators

    This even simpler code also has hidden exception handling:
        print(a+b)
    
    Under the covers, a+b looks something like this:
        def _checkni(ret):
            if ret is NotImplemented: raise NotImplementedError
            return ret
    
        def add(a, b):
            try:
                if issubclass(type(b), type(a)):
                    try:
                        return _checkni(type(b).__radd__(b, a))
                    except (NotImplementedError, AttributeError):
                        return _checkni(type(a).__add__(a, b))
                else:
                    try:
                        return _checkni(type(a).__add__(a, b))
                    except (NotImplementedError, AttributeError):
                        return _checkni(type(b).__radd__(b, a))
            except (NotImplementedError, AttributeError):
                raise TypeError("unsupported operand type(s) for +: '{}' and '{}'".format(
                    type(a).__name_, type(b).__name__))
            else:
                return ret
    

    Attributes

    Even the simple dot syntax in the above examples hides further exception handling. Or, for a simpler example, this code:
        print(spam.eggs)
    
    Under the covers, spam.eggs looks something like this:
        spam.__getattribute__('eggs')
    
    So far, so good. But, assuming you didn't define your own __getattribute__ method, what does the object.__getattribute__ that you inherit do? Something like this:
        def _searchbases(cls, name):
            for c in cls.__mro__:
                try:
                    return cls.__dict__[name]
                except KeyError:
                    pass
            raise KeyError
    
        def __getattribute__(self, name):
            try:
                return self.__dict__[name]
            except KeyError:
                pass
            try:
                return _searchbases(type(self), name).__get__(self, type(self))
            except KeyError:
                pass
            try:
                getattr = _searchbases(type(self), '__getattr__')
            except KeyError:
                raise AttributeError("'{}' object has no attribute '{}'".format(
                    type(self).__name__, name))
            return getattr(self, name)
    
    Of course I cheated by using cls.__mro__, cls.__dict__ and descriptor.__get__ above. Those are recursive calls to __getattribute__. They get handled by base cases for object and type.

    hasattr

    Meanwhile, what if you want to make sure a method or value attribute exists before you access it?

    Python has a hasattr function for exactly that purpose. How does that work? As the docs say, "This is implemented by calling getattr(object, name) and seeing whether it raises an AttributeError or not."

    Again, even when you try to LBYL, you're still raising and handling exceptions.

    Objections

    People who refuse to believe that Python isn't Java always raise the same arguments against EAFP.

    Exception handling is slow

    No, it isn't. Except in the sense that Python itself is horribly slow, which is a sense that almost never matters (and, in the rare cases when it does, you're not going to use Python, so who cares?).

    First, remember that, at least if you're using CPython, every bytecode goes through the interpreter loop, every method call and attribute lookup is dynamically dispatched by name, every function call involves a heavy-duty operation of building a complex frame object and executing multiple bytecodes, and under the covers all the values are boxed up. In other words, Python isn't C++.

    But let's do a quick comparison of the simplest possible function, then the same function plus a try statement:
        def spam():
            return 2
        def eggs():
            try:
                return 2
            except Exception:
                return 0
    
    When I time these with %timeit on my laptop, I get 88.9ns and 90.8ns. So, that's 2% overhead. On a more realistic function, the overhead is usually below measurability.

    In fact, even in C++, you'll see pretty much the same thing, unless you're using a compiler from the mid-90s. People who say "exceptions are slow" really don't know what they're talking about in any language.

    But it's especially true in Python. Compare that 1.9ns cost to the 114ns cost of looking up spam as a global and calling it. If you're looking to optimize something here, the 128% overhead is surely a lot more important than the 2%.

    What about when you actually raise an exception? That's a bit more expensive. It costs anywhere from 102ns to 477ns. So, that could almost quintuple the cost of your function! Yes, it could—but only if your function isn't actually doing anything. How many functions do you write that take less than 500ns to run, and which you run often enough that it makes a difference, where optimizing out 477ns is important but optimizing out 114ns isn't? My guess would be none.

    And now, go back and look at the for loop from the first section. If you iterate over a million values, you're doing the 1.9ns wasted cost 999,999 times—buried inside a 114ns cost of calling next each time, itself buried in the cost of whatever actual work you do on each element. And then you're doing the 477ns wasted cost 1 time. Who cares?

    Exceptions should only be for exceptional cases

    Sure, but "exceptional" is a local term.

    Within a for loop, reaching the end of the loop is exceptional. To code using the loop, it's not. So the for loop handles the StopIteration locally.

    Similarly, in code reading chunks out of a file, reaching the end of the file is exceptional. But in code that reads a whole file, reaching the end is a normal part of reading the whole file. So, you're going to handle the EOFError at a low level, while the higher-level code will just receive an iterator or list of lines or chunks or whatever it needs.

    Raise exceptions, and handle them at the level at which they're exceptional—which is also generally going to be the level where you know how to handle them.

    Sometimes that's the lowest possible level, in which case there isn't be much difference between using exceptions and returning (value, True) or (None, False). But often it's many levels up, in which case using exceptions guarantees that you can't forget to check and percolate the error upward to the point where you're prepared to deal with it. That, in a nutshell, is why exceptions exist.

    Exceptions only work if you use them everywhere

    That's true. And it's a serious problem in C++ (and even more in ObjC). But it's not a problem in Python—unless you go out of your way to create a problem by fighting against Python. Python uses exceptions everywhere. So does all the idiomatic Python code you're going to be interfacing with. So exceptions work.

    C++ wasn't designed around exceptions in the first place. This means:
    • C++ has a mishmash of APIs (many inherited from C), some raising exceptions, others returning error values.
    • C++ doesn't make it easy to wrap up error returns in exceptions. For example, your compiler almost certainly doesn't come with a helper function that wraps up a libc or POSIX function by checking for nonzero return and constructing an exception out of the errno and the name of the function—and, even if it did, that function would be painful to use everywhere.
    • C++ accesses functions from C libraries just like C, meaning none of them raise exceptions. And similarly for accessing Java functions via JNI, or ObjC functions via ObjC++, or even Python functions via the Python C API. Compare that to Python bindings written with ctypes, cffi, Cython, SIP, SWIG, manually-built extension modules, Jython, PyObjC, etc.
    • C++ makes it very easy to design classes that end up in an inconsistent state (or at least leak memory) when an exception is thrown; you have to manually design an RAII class for everything that needs cleanup, manage garbage yourself, etc. to get exception safety.
    In short, you can write exception-safe code in C++ if you exercise sufficient discipline, and make sure all of the other code you deal with also exercises such discipline or go out of your way to write boilerplate-filled wrappers for all of it.

    By comparison, ou can write exception-safe code in Python just by not doing anything stupid.

    Exceptions can't be used in an expression

    This one is actually true. It might be nice to be able to write:
        process(d[key] except KeyError: None)
    
    Of course that particular example, you can already do with d.get(key), but not every function has exception-raising and default-returning alternatives, and those that do don't all do it the same way (e.g., str.find vs. str.index), and really, doesn't expecting everyone to write two versions of each function seem like a pretty big DRY violation?

    This argument is often a bit oversold—it's rarely that important to cram something non-trivial into the middle of an expression (and you can always just wrap it in a function when it is), so it's usually only a handful of special cases where this comes up, all of which have had alternatives for decades by now.

    Still, in a brand-new language an except expression seems like a better choice than d[k] vs. d.get(k) and so on. And it might even be worth adding today (as PEP 463 proposes).

    But that's not a reason to avoid exceptions in your code.

    What about Maybe types, callback/errback, Null-chaining, Promise.fail, etc.?

    What about them? Just like exceptions, these techniques work if used ubiquitously, but not if used sporadically. In Python, you can't use them ubiquitously unless you wrap up every single builtin, stdlib, or third-party idiomatic exception-raising function in a Maybe-returning function.

    (I'm ignoring that fact that most of these don't provide any information about the failure beyond that there was a failure, because it's simple to extend most of them so they do. For example, instead of a Maybe a that's Just a or Nothing, useone that's Just a or Error msg, with the same monad rules, and you're done.)

    So, if you're using Haskell, use Maybe types; if you're using Node.js, use promises; if you're using Python, use exceptions. Which just takes us back to the original point: if you don't want to use exceptions, don't use Python.

    Race conditions

    I mentioned at the top that, among other problems, trying to use LBYL everywhere is going to lead to code that's full of race conditions. Many people don't seem to understand this concept.

    External resources

    Are these two pieces of code functionally equivalent?
        with tempfile.NamedTemporaryFile(dir=os.path.dirname(path), delete=False) as f:
            f.write(stuff)
            if not os.path.isdir(path):
                os.replace(f.name, path)
                return True
            else:
                return False
    
        with tempfile.NamedTemporaryFile(dir=os.path.dirname(path), delete=False) as f:
            f.write(stuff)
            try:
                os.replace(f.name, path)
                return True
            except IsADirectoryError:
                return False
    
    What if the user renamed a directory to the old path between your isfile check and your replace? You're going to get an IsADirectoryError—one that you almost certainly aren't going to handle properly, because you thought you designed your code to make that impossible. (In fact, if you wrote that code, you probably didn't think to handle any of the other possible errors…)

    But you can make this far worse than just an unexpected error. For example, what if you were overwriting a file rather than atomically replacing it, and you used os.access to check that he's actually allowed to replace the file? Then he can replace the file with a symlink between the check and the open, and get you to overwrite any file he's allowed to symlink, even if he didn't have write access. This may sound like a ridiculously implausible edge case, but it's a real problem that's been used to exploit real servers many times. See time-to-check-time-of-use at Wikipedia or at CWE.

    Plus, the first one is much less efficient. When the path is a file—which is, generally, the most common and most important case—you're making two Python function calls instead of one, two syscalls instead of one, two filesystem accesses (which could be going out over the network) instead of one. When the path is a directory—which is rare—they'll both take about the same amount of time.

    Concurrency

    Even without external resources like files, you can have the same problems if you have any internal concurrency in your code—e.g., because you're using threading or multiprocessing.

    Are these the same?
        if q.empty():
            return None
        else:
            return q.get()
    
        try:
            return q.get(block=False)
        except Empty:
            return None
    
    Again, the two are different, and the first one is the one that's wrong.

    In the first one, if another thread gets the last element off the queue between your empty check and your get call, your code will end up blocking (possibly causing a deadlock, or just hanging forever because that was the last-ever element).

    In the second one, there is no "between"; you will either get an element immediately, or return None immediately.

    Conclusion

        try:
            use_exceptions()
        except UserError:
            sys.exit("Don't use Python")
    
    2

    View comments

  4. If you look at Python tutorials and sample code, proposals for new language features, blogs like this one, talks at PyCon, etc., you'll see spam, eggs, gouda, etc. all over the place. Why?

    Metasyntactic variables

    If you're writing some toy code that doesn't do anything (e.g., it just demonstrates some syntax), there are obviously no meaningful names to give the variables and types in that code. What you need are words that are obviously meaningless, and obviously placeholders for the meaningful names that you'd use in real code.

    Of course there are no such words (except maybe "um" and "like" and the like), so the programming community has to invent a few and use them by convention. These are called metasyntactic variables; Wikipedia explains why they're called that, and some of the history. In other languages, they're usually called foo, bar, baz, and qux.

    Python has its own unique set of metasyntactic variables, which are actual words, but words unlikely to appear in normal code. This has the advantage that the pattern can be extended in new ways and everyone will intuitively know what you mean. Except, of course, that you have to know the pattern.

    Spam, eggs, cheese, beans, toast, and ham

    Python is named after Monty Python, because Python's inventor, Guido van Rossum, is a big fan, like many computer geeks. "Spam" is one of Monty Python's most famous skits. Most of the words in the skit are the repetitive names of the heavily-spam-focused breakfast dishes on the menu, plus a group of Vikings singing a song about Spam. So, Python uses the ingredients of those dishes for its metasyntactic variables.

    Here's the menu:

    • egg and bacon
    • egg, sausage, and bacon
    • egg and spam
    • egg, bacon, and spam
    • egg, bacon, sausage, and spam
    • spam, bacon, sausage, and spam
    • spam, egg, spam, spam, bacon, and spam
    • spam, spam, spam, egg, and spam
    • spam, spam, spam, spam, spam, spam, baked beans, spam, spam, spam, and spam
    • lobster thermidor aux crevettes with a Mornay sauce, garnished with truffle paté, brandy, and a fried egg on top, and spam
    So, you can see where the standard metasyntactic variables in Python come from.

    Well, almost. Cheese, ham, and toast aren't even mentioned in the skit (although ham does appear in one of the silly names in the credits of the episode), and beans only appears once, while bacon and sausage are all over the place. So, why?

    Back in the early 90s, we didn't have YouTube and t'Netflix. In them days, near 30 year ago, if we wanted to watch our favorite old shows, we were glad to find them on videotapes. Without subtitles. Or readable picture quality. Or, often, videotapes. And that was if you had a VCR. We never had a VCR, you used to have to spool the tape by hand and try to read it with a compass needle. If you were lucky enough to have hands, that is, we couldn't afford 'em. Still, we were happy in them days. We couldn't go to t'Wikipedia on t'World Wide Web to look up information we'd forgotten. We had t'Yorkshire Wide Web, which had nought but an ASCII art picture of a terrier, only it was in EBCDIC, mind, so you had to translate it by hand. We had to remember things for ourself. Except we were too busy remembering other people's things. Only job you could get back then. Paid thruppence a week, working 24 hours a day. Started when we were three years old and got our first lunch break at six. We'd get a crust o' stale bread and back to work for t'next three years. 'Course lunch break wasn't paid, we had to pay the owner to take the time off, then pay him to come back to work, but it was a living, and we were happy to have it. We could take the money home to our Dad and maybe he'd only kill us and dissolve our bones in acid once or twice before supper. You try and tell the young people of today that, and they won't believe you.

    When you need a superclass of Spam, Eggs, and Cheese, or a class that has spam, eggs, and cheese members, you've got Breakfast, and sometimes Menu. Since none of these types have any obvious verbs associated with them, when you need to talk about methods, you'll occasionally see waitress.serve(spam) or song = [viking.sing() for viking in vikings].

    Gouda and edam

    Guido van Rossum comes from Holland. Not the Netherlands, the part of the Netherlands called Holland. Although he's also from the Netherlands. And he's Dutch, too. Maybe this is why the most famous line in the Zen of Python is "There should be one-- and preferably only one --obvious way to do it. Although that way may not be obvious at first unless you're Dutch."

    Gouda and Edam are two famously Dutch cheeses. So, when you need to create meaningless instances of a meaningless type in your example code, the type is usually Cheese, and the instances will be gouda and edam, because those are two famously Dutch cheeses. (If you need more, use some other Dutch cheeses. Just don't use griene tsiis or nayl tsiis, because those aren't really food, they're just something that Frisians invented so they can claim their language sounds like English.)

    Tulips are also an important part of Dutch history, especially in the North Holland area that Guido is from. Most famously (if not accurately), the Dutch cornered the market on tulips and then created a speculative bubble that ruined their economy, which is what allowed the English to take over as the leaders of the financial world. So, obviously, that's what the prerelease version of the new asyncio library for Python 3.4 was called. But don't eat tulips, as they're mildly toxic and taste horrible.

    Counting to four three

    Sometimes, examples will be numbered 1, 2, and 4. Especially when someone's making a point about 0-based vs. 1-based indexing. This is a reference to the running joke in Monty Python and the Holy Grail, there's a running joke about Arthur having problems counting to three. In particular, the Holy Hand Grenade of Antioch scene has a biblical quote from Armaments 2:9-21:
    And the LORD spake, saying, 'First shalt thou take out the Holy Pin. Then shalt thou count to three, no more, no less. Three shall be the number thou shalt count, and the number of the counting shall be three. Four shalt thou not count, neither count thou two, excepting that thou then proceed to three. Five is right out. Once the number three, being the third number, be reached, then lobbest thou thy Holy Hand Grenade of Antioch towards thy foe, who being naughty in My sight, shall snuff it.'
    There's also a joke from the Spanish Inquisition skit, where one of the cardinals is trying to enumerate the one two three four diverse chief weapons of the Inquisition.

    Hungarian

    "My hovercraft is full of eels" obviously needs no explanation; what might need an explanation is why translating from Hungarian would be relevant in programming.

    Hungarian notation, named for the famously Hungarian Xerox/Microsoft employee Charles Simonyi, means encoding the type of a variable into its name. For example, instead of having a variable "name" or "pos", you'd have "szName", which tells you that it's a zero-terminated string, or "rwPos", which tells you that it's a row rather than column position. Relatedly, in Perl and related languages, instead of "name" or "names" you'd have "$name", which tells you that it's a scalar, and "%names", which tells you that it's a hash (dictionary). In Python, none of this is considered idiomatic. If you need separate row and column positions, go ahead and call them "row_pos" and "col_pos", but don't try to come up with a standard "rw" abbreviation and apply it to all row variables whether needed for disambiguation or not. So, if you've translated sample code from (usually) Windows-specific C++ or Visual Basic to Python, and come up with a bunch of variable names like "szName", you've mistranslated from Hungarian.

    This is usually a sign of a bigger and more general problem: Translating code at the line-by-line level from a very different language is almost always going to give you non-idiomatic, inefficient, unmaintainable code. This actually comes up more often nowadays with people trying to write Java code, or occasionally even Scheme/OCaml/Haskell code, in Python. But it's the same problem.

    "My hovercraft is full of eels" is shorthand for both of these ideas. If someone says that as a comment on your code, it means your code doesn't look like Python.
    0

    Add a comment

  5. Most control structures in most most programming languages, including Python, are subordinating conjunctions, like "if", "while", and "except", although "with" is a preposition, and "for" is a preposition used strangely (although not as strangely as in C…).

    But there's a whole spectrum of subordinating conjunctions that Python hasn't found a use for.

    Negative subordinators

    Some languages, most famously Perl, provide "unless" and "until".

    While people do regularly propose these be added to Python, they don't really add much. Anything that could be written "unless condition: body" can just as readably be written "if not condition: body", so there's no real need to use up these keywords or make the grammar harder to fit into people's heads.

    Besides, if we're going to have "unless" and "until", why not also negate the others, which are much harder to simulate in such a trivial way?

    "without context: body" ensures that the context is exited before the body, instead of after. The security benefits of this should be obvious—you can't accidentally overwrite critical system files if you can be sure they're closed before you try to write them.

    "try: body1 including E: body2" is hard to read directly as English—but then the same is true of "try: body1 except E: body 2". The best you can do is "try to body1, except if E happens somewhere, in which case body2 instead." Similarly, "try to body1, including if E happens somewhere, in which case body2 as well." Just as with "except", describing the semantics of "including" concretely is easier than trying to make it intuitive: it's a continuable exception handler, just like in the Common Lisp Condition System. And yes, this means that the exception handler runs in the scope that raised, not in the scope where it's defined, so it can be (ab)used for dynamic scoping. Have fun!

    Finally, if the "for" statement runs the body once with each element of the iterable; the "butfor" statement obviously runs the body exactly once, with all of the elements that aren't in the iterable. Note that this provides a syntactic rather than library-driven way to force collection in non-refcounted implementations of Python.

    Where or given

    In mathematical proofs, and in even more abstract domains like Haskell programming, "where" is used to bind a variable for the duration of a single expression. Of course "let" does the same thing, but it's generally more readable to use "let" before the main expression for binding first-order values and "where" after the expression for binding function values, or occasionally very complicated first-order values whose expression isn't central to the main point.

    PEP 3150 proposed adding this syntax to Python. However, because code written for NumPy, many SQL expression libraries, etc. tends to make extensive use of functions named "where", the PEP was changed to suggest the preposition "given" instead of "where".

    Before and after

    The most common use proposed for "before" and "after" is as prepositions (sometimes along with "around"), to add code that's executed before or after some function's body (e.g., to check pre- and post-conditions).

    But as conjunctions, they could be used differently: whenever any assignment or other mutation would make the condition true, first any "before" bodies are executed, then the assignment is performed, then any "after" bodies are executed. If, after any of those bodies, the condition is no longer true (or going to be true, for mulitple "before" bodies), the rest aren't executed.

    For example:

        before letter == 'k':
            letter = 'a'

        letter = 'a'
        for _ in range(26):
            letter = chr(ord(letter) + 1)
            print(letter, end='')

    This would print out "abcdefghijabcdefghijabcdef".

    This could even be used for cross-thread or -process synchronization. This is effectively like attaching callbacks to futures, but allowing you to treat any expression as a future.

    This makes sense for dataflow languages, Make, maybe even Prolog-style languages… but does it make sense for Python?

    Well, we do have similar features in many GUI libraries. For example, Tkinter allows you to attach a validation function that gets run before any change to an Entry widget, which can reject the change. Cocoa has spamWillFry: and spamDidFry: notifications. And so on.

    The big problem with such features is that they're always implemented in an ad-hoc way. The API is radically different in each GUI library, and often not even complete. For example, Tkinter only has a "before", not an "after" (although attaching a variable and adding a trace callback can simulate that); its "before" only works on Entry widgets (and not, say, Text); and it has a bizarre interface.

    Another problem is that you can only really watch a simple variable, not a complex expression. For example, if I wanted to do something after a+b > 10, I'd have to ask for notifications on both a and b and check their sum each time. Why can't the language do that for me?

    And finally, a library has to be able to identify all the places that it might change whatever someone might be watching and remember to notify whoever is watching. Again, why can't the language do that for me?

    Well, the obvious answer is that the language would have to monitor every single change to anything and check it against every registered "before" or "after" condition. In fact, most debuggers provide a simplified version of this, called "watchpoints", and, even with hardware support, it's often too slow to run your whole program with any watchpoints enabled.

    But computers are getting faster all the time. Or at least more parallel. And this is one case where the very things that usually make parallelism more painful might instead be beneficial. The code could be triggered by cache coherence discipline, and CPU manufacturers already have no choice but to make cache coherence controllers very complicated and highly optimized or they'll break every C program ever written.

    Whenever

    There are two possible meanings to a "whenever" statement—but fortunately, they are not syntactically ambiguous.

    "whenever condition: body" is similar to an "after" statement, except that rather than only executing its body the first time the condition becomes true, it executes its body every time the condition becomes true after having been false. It is well known that edge-triggered notifications are the most efficient way to implement reactor-type code, and that non-self-resetting notifications are easier to deal with than one-shot notifications, so this would almost certainly have uses in highly performant network code.

    On the other hand, "whenever: body" with no condition simply asks that body be executed at some point. Presumably this would happen at the next "await" or "yield from", or, if for some reason someone has written a program that never uses either of those feature (maybe legacy code from Python 2.x), as an atexit handler.

    Lest

    While "lest" isn't used as commonly in modern English as it could be, there's really no good alternative synonym.

    "lest condition: body" speculatively executes the body, then, only if that would make the condition true, commits it.

    One of the problems with STM (software transactional management) is that most languages don't have very good syntax to press into service. While constructs like Python's "with" statement (or, in C++, a bare clause and an RAII constructor) can be used to represent a transaction, it's more than a little misleading, because normally a "with" statement's context manager cannot control the execution of the statement's body. A "lest" statement, on the other hand, is a perfect way to represent STM syntactically.

    As If

    Adding a two-keyword statement would probably be a little more controversial than some of the other proposals contained herein, but it has the advantage that both are already keywords. (If a single keyword were desired, "though" could be an option, though it's an archaic meaning, and likely to be confused with the more common meaning used in this very sentence.)

    This is similar to the PEP 3150 "while"/"given" statement, but far more powerful: rather than just creating a temporary binding, this creates whatever changes are needed to temporarily make the condition true, then executed the body with those changes in effect, then reverts the changes.

    For example, many students are assigned to create a Caesar cipher program, and they get as far as "letter = chr(ord(letter) + 1)", but then don't know how to prevent "z" from mapping to "{". This addition would make that task trivial:

        as if letter <= 'z':
            print(letter)

    Of course in this case, it would be a matter of implementation quality whether a given Python interpreter printed "a", "z", or even a NUL character.

    Because

    In normal execution, "because condition: body" is similar to "if", except that it raises an exception if the condition is false instead of just skipping the body.

    However, this can be used as a hook for semantic profilers. Instead of just blaming execution time on function calls, a semantic profiler can blame specific conditions. If the program spends 80% of its time executing code "because spam", then clearly the way to optimize it is to either make "spam" less likely, or, if that's not possible, to handle it more efficiently.

    A because statement can also provide useful hints to a JIT compiler. The compiler can assume that the condition is true when deciding which code path to optimize, knowing that if it's not true an exception will be thrown.

    It could also be useful to add a "whereas" statement, that has the same effect as "because" in normal execution, but has the opposite meaning to a semantic profiler.

    Whenever

    A "whenever" statement is similar to a "before" or "after" statement

     combined with a "while" statement. As soon as the condition becomes true, the body begins executing in a loop, in a new thread. until the condition becomes false.

    So

    "so condition: body" has elements of both "lest" and "whenever", but is potentially much more powerful.

    At any point, if any condition (the same as "condition" or otherwise) is being checked for any statement, and "condition" is false, and "condition" being true would change the truth value of the condition being checked, "body" is run. If "condition" still remains false, an exception is raised.

    Of course using a "so" statement in any program that also uses "before" and "after" is liable to cause severe performance problems, which would be hard to diagnose without judicious use of "because" statements. But surely that falls under "consenting adults", and anyone who cares about performance should be using Fortran anyway.

    Implementation notes

    Although I haven't yet tried creating patches for any of these new syntactic forms, it's worth considering that they are already handled in two languages closely related to Python (pseudocode and English), so there should be little trouble.
    0

    Add a comment

  6. There are two ways that some Python programmers overuse lambda. Doing this almost always mkes your code less readable, and for no corresponding benefit.

    Don't use lambda for named functions

    Some programmers don't understand that in Python, def and lambda define exactly the same kind of function object. (Especially since the equivalent is not true in C++, Ruby, and many other languages.)

    The differences between def and lambda are:

    • def gives you a named function, lambda an anonymous function.
    • def lets you include statements in the body, lambda only an expression.
    • lambda can be used within an expression, def cannot.

    So, when you want to, e.g., define a short callback in the middle of an async request or a GUI widget constructor, especially when it doesn't have an obvious good name, use lamdba.

    But when you want to define a named function, use def.

    In other words, don't write this:

    • iszero = lambda x: hash(x) == hash(0)
    (There are other problems with that function, but let's ignore them…)

    What's wrong with lambda for named functions?

    • Following idioms matters.
    • That "iszero" function up there may be bound the the module global name "iszero"—but if it shows up in a traceback, or you print it out at the interactive prompt, or use the inspect module on it, its name is actually <lambda>. That's not nearly as useful.
    • The def statement's syntax parallels function call syntax: "def iszero(x):" means it's called with "iszero(x)". That isn't true for "iszero = lambda x:".
    • If you later need to change the body to, say, add a try/except around it, you can't do that in a lambda.

    But isn't lambda a lot more concise?

    In exchange for dropping the 6-letter "return" keyword, and possibly a pair of parens, you've replaced a 3-letter "def" keyword with a 6-letter "lambda" keyword, and added a space and an equals sign. How much more concise do you think that's going to be? Test it yourself; you save 1-3 characters this way; that's it.

    If you're thinking that lambda can go on one line and def can't, of course it can:
    • def iszero(x): return hash(x) == hash(0)
    Sure, PEP 8 says that one-line compound statements are "generally discouraged", but it includes plenty of "Yes" examples that do exactly that; when the recommendations section of a document that itself is only a guideline specifically points out that something is just a "usually don't" rather than a hard "don't", that means something.

    And, more importantly, avoiding making your code arguably unpythonic by transforming it into something definitely even more unpythonic is not helping.

    And if you're just doing it to trick your linter, I shouldn't have to explain why using a linter but then tricking it is kind of pointless.

    Don't use lambda when you don't need a function

    In Python, there should be one, and only one, obvious way to do it. And yet we have both comprehensions and higher-order functions like map and filter. Why?

    When you have an expression to map or filter with, a comprehension is the obvious way to do it.

    When you have a function to map or filter with, the higher-order function is the obvious way to do it.

    It's a little silly (and maybe inefficient) to wrap a function call in an expression just to avoid calling map; it's a lot sillier (and more inefficient) to wrap an expression in a function just so you _can_ call map. Compare:
    • vals = (x+2*y for x, y in zip(xs, ys)) # good
    • vals = map(lambda (x, y): x+2*y, zip(xs, ys)) # bad, and illegal in Python 3
    • vals = map(lambda xy: xy[0]+2*xy[1], zip(xs, ys)) # even worse, but legal
    The point of your code is not calling a function that adds some numbers, it's just adding some numbers. There's no clearer way to write that than "x+2*y".

    Of course the silliest of all is doing both of these things—you have a function, which you wrap in an expression which you then wrap in a function again:
    • vals = map(spam, vals) # good
    • vals = map(lambda x: spam(x), vals) # really?

    Borderline cases

    Sometimes, you have a function, but it may not be very obvious to all readers:
    • vals = (x+y for x, y in zip(xs, ys)) # good
    • vals = map(operator.add, zip(xs, ys)) # also good
    Or the only expression that can replace a function is a higher-order expression that some readers may not grasp:
    • vals.sort(key=functools.partial(spam, maxval)) # decent
    • vals.sort(key=lambda x: spam(maxval, x)) # also decent
    Here, it's really a judgment call. It depends who you expect to be reading your code, and what feels most natural to you, and maybe even what the surrounding code looks like.

    Of course when you need to do something that you can't quite do with the higher-order tools that Python has provided, it's even more obviously a judgment call. You don't want to write (or install off PyPI) a right-partial, argument-flip, or compose function just to avoid one use of lambda—but to avoid 200 uses of lambda throughout a project, maybe you do.

    Comprehensions only handle map and filter

    As great as comprehensions are, they don't let you replace functools.reduce or itertools.dropwhile, only map and filter. If you are going to use functions like that, you don't have any choice but to wrap up your transforming expression or predicate test in a function. Of course sometimes that's a reason not to use those functions (Guido likes to say inside every reduce is a for loop screaming to get out), but sometimes it isn't. Again, each case is a judgment call.

    So why do we even have lambda?

    Because every time Guido suggests getting rid of it, the rest of the community shouts him down. :)

    More seriously, lambda is great for exactly the kinds of examples I gave above—a short and simple button callback, or sorting key, or dropwhile predicate; doing something that's just outside the grasp of partial; etc.

    Sometimes you need a function that's anonymous, trivial, and can be written in-line within an expression. And that's why we have lambda.
    1

    View comments

  7. Some languages have a very strong idiomatic style—in Python, Haskell, or Swift, the same code by two different programmers is likely to look a lot more similar than in Perl, Lisp, or C++.

    There's an advantage to this—and, in particular, an advantage to you sticking to those idioms. It means that whenever you do violate the idioms, that's extra meaning in your code for free. That makes your code more concise and more readable at the same time, which is always nice.

    In Python, in particular, many of PEP 8's recommendations are followed nearly ubiquitously by experienced Python developers. (Certain domains that have their own overriding recommendations—NumPy-based scientific and numeric code, Django-based web applications, internal code at many large corporations, etc.—but in those cases, it's just a matter of some different idioms to follow; the situation is otherwise the same.) So, it's worth following the same recommendations if you ever want anyone else to read your code (whether because they're submitting patches, taking over maintenance, or helping you with it on StackOverflow).

    Example

    One of PEP 8's recommendations is that you test for empty sequences with "if not seq:". But occasionally, you have a variable that could be either None or a list, and you want to handle empty lists differently from None. Or you have a variable that absolutely should not be a tuple, only a list, and if someone violated that, an exception would be better than doing the wrong thing. You can probably think of other good cases. In all those cases, it makes a lot more sense to write "if seq == []:".

    If you've followed PEP 8's recommendations throughout your code, and then you write "if seq == []:" in one function, it will be obvious to the reader that there's something special going on, and they'll quickly be able to figure out what it is.

    But if you've ignored the recommendations and used "if seq == []:" all over, or, worse, mixing "if not seq:", "if seq == []:", "if len(seq) == 0:", etc. without rhyme or reason, then "if seq == []:" has no such meaning. Either the reader has to look at every single comparison and figure out what you're really testing for, or just assume that none of them have anything worth looking at.

    Of course you can work around that. You can add a comment, or an unnecessary but explicit test for None, to provide the same signal. But that's just extra noise cluttering up your code. If you only cry wolf when there's an actual wolf, crying wolf is all you ever have to do.

    1

    View comments

Blog Archive
About Me
About Me
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.