Many tutorials on object-oriented programming conflate inheritance and subtyping. In fact, it's often considered part of the OO dogma that they should be conflated.

This is wrong for Python in a wide variety of ways.

Substitutability

Let's start with the Liskov Substitution Principle: A is a subtype of B iff instances of type A can substitute for instances of type B without affecting correctness or semantics. That's what it means to be a subtype.*

* Three things to note here, without getting into details (yet). First, mutability makes this more complicated than most people think at first—see the section on the ellipse-circle problem in the linked article. Second, languages like Python that have more complicated signatures—optional parameters, keyword-only parameters, vararg parameters, etc.—are harder to define rigorously, but all of those cases are easy to understand intuitively (e.g., a method that takes two integers can be overridden by a method that takes two integers and an optional string, because any call to the supertype's method will just passing two integers, which is valid for the subtype's method). Third, Python makes extensive use of what are implicitly union types and parametric types, but has no obvious way to represent them explicitly—a strstr-like function that returns None if not found is of type (str, str)->(int | NoneType); a read4k function could be described as of type (TextIOBase->str) | (IOBase->bytes) or of type (IOBase<T> -> String<T>). The MyPy-based static typing syntax for Python 3.5 or 3.6 will make some of this representable, but it's still not going to be completely parametric in the sense of ML or Haskell.

Nominal vs. structural substitutability

In many modern OO languages, only subclasses can be subtypes (and all subclasses should be subtypes, but this is usually only partly enforced). If I write a function that takes a File object, so I can read from it a line at a time (or 4K at a time), you can only call my function with something that inherits from File.*

In Python, this not only isn't necessary, it isn't idiomatic. If I write a function that wants to read a line at a time, I document that I want a file-like object that iterates a line at a time, and I don't verify that what I've gotten is-a file in any way, I just try to iterate it and expect to get strings. You can pass me an io.StringIO (which is basically a TextIOWrapper that isn't wrapping a file), or just a list of strings, or some type from a library that knew nothing about my library and vice-versa, and everything still works.

That's the difference between nominal and structural** (or behavioral***) subtyping, and that difference is half of what duck typing is all about. I'll get to the other half in a bit.

* A few languages (like C++) make things more complicated by allowing coercive typing: any type A that can be implicitly coerced to type B is a subtype of A, at least in some contexts… But let's ignore that.

** Structural subtyping can also include layout. In Python, since you generally store attributes in a dict rather than at specific offsets, layout is irrelevant beyond the names and types of the attributes. (Except, of course, that "generally" doesn't mean "always"; you can use __slots__, or inherit from ctypes.Structure, or write an extension module, or use nothing but descriptors in the class dictionary, or…) Also, in Python, classes generally don't even directly describe the attributes of their instances; they're created dynamically in __init__ (or even later), so checking that something is a subtype isn't even a sensible question except dynamically.

*** Behavioral subtyping has two different meanings: it can mean subtyping relationships that include semantics beyond those normally discussed in type theory, but it can also mean ad-hoc structural subtyping of exactly the kind Python uses.

What about isinstance?

But hold on. If I subtype without subclassing in Python, the result isn't really substitutable, because there's nothing stopping a function from calling isinstance and detecting that I'm not really what I'm claiming to be, right?

Well, that's true. But it's also true that even if I do subclass, there's nothing stopping a function from calling type (or looking at my __class__ attribute, or using the inspect module) and detecting that I'm not really what I'm claiming to be.

This is why idioms and conventions are important. In Java or C++, switching on isinstance-like checks (e.g., C++ dynamic_cast) is perfectly acceptable, because it gives you the exact same results that virtual function dispatch would give you. Switching on the exact type is strongly discouraged, because it effectively makes subtyping impossible.

In Python, isinstance is considered better than type, but still to be avoided whenever possible, precisely because it doesn't give you the same results that dynamic method lookup would give you, and it effectively makes subtyping impossible.

ABCs: practicality beats purity

So, structural subtyping good, nominal subtyping bad… But there really are some cases where it's useful to be able to check whether A is a subtype of B (this is just a specific case of "there are some cases where LBYL is more useful than EAFP"), and how else are you going to do that but isinstance?

But in most of those cases, you should use an abstract base class, not a normal base class. Don't check isinstance(arg, list), check isinstance(arg, collections.abc.MutableSequence). Note that this is exactly the same as using interfaces vs. base classes in Java or C#—the difference being that in Java, you always have to define an interface, while in Python you always can define an ABC, but usually don't have to.

But notice that ABC's don't require inheritance for subclassing. In fact, there are three ways to make a class a subclass of an ABC:
  • In the subclass's implementation, by inheriting from the ABC.
  • In the ABC's implementation, by writing a subclass hook that returns True for the subclass. (Note that this is usually done to implement structural subtyping, but it can just as easily be used to supertype a specific existing class that you can't modify.)
  • Outside of both implementations, in an application that just wants to make the two play together nicely, by registering the class with the ABC.
And meanwhile, Python doesn't even strictly require substitutability for ABCs. A TextIOBase is-a IOBase, even though IOBase.readline returns a bytes-like object and TextIOBase.readline returns a str-like object. Why? Well, as I'll explain later in the section on mixins, IOBase is more useful as a mixin class than as a subtyping class, and it's useful for IOBase to provide a default readline implementation for any subclass that provides readinto but doesn't provide readline. Speaking from a purist point of view, this means that TextIOBase isn't a subtype of IOBase at all. But from a practical point of view, it just means that readline isn't part of the IOBase type, even though it's part of the IOBase ABC.

What it all means

In (purist) theory, Python doesn't allow subtyping at all (because you can always call type), but in practice, Python uses subtyping all over the place—and as long as you write reasonably Pythonic code, it works wonderfully.

That's because duck typing, not subclassing, is the normal way to represent subtyping; inheritance is only one of three equally valid ways to implement subclassing; isinstance is used sparingly and only when you know you have ABCs to deal with; type is used only for debugging and very rare edge cases.

To anyone coming to Python from Lisp, Smalltalk, or JavaScript, this is all pretty obvious, but people coming from Java or C# (or, worse, languages that either mix nominal and duck typing haphazardly like Ruby, or mix them in a complicated way that many people never quite figure out like C++) end up writing non-Pythonic code because they think they need subclassing all over the place, when they actually don't.

Is this really subtyping at all?

The reason subtyping is important in OO in the first place is that it's how OO does polymorphism—that is, functions that can work with different types. You can write a function that parses an RFC822 header out of any file-like object, and it doesn't matter whether that object is a file on disk, a transport wrapper around a socket, or a wrapper around a string that you've built up in memory.

Above, I've described Python (and other duck-typed languages) as using implicit, dynamic, structural subtyping for its polymorphism. But there's a different way to look at it: You can just argue that Python's polymorphism isn't based on subtypes at all. Instead, its polymorphic functions are generic functions. The fact that many of those functions are written as OO-style methods and called using OO-style dot syntax doesn't necessarily mean there's any subtyping going on.

Well, I could point out that any language whose type system is powerful enough to represent both parametric types and union types has generic functions that can fully simulate OO methods (pause here for every Haskell programmer to say "duh"), although the converse is not true (pause again), and that duck typing automatically gives you a sufficiently powerful type system except for the compile-time part (kick off an async task to deal with the Haskell programmers who are now saying "the except is the whole point"*).

But subtyping is the most useful way to express much of what people frequently do with Python. For example, extend my RFC822 parser to an HTML handler: It reads an RFC822 header by reading line by line until it gets a blank line, then it does a binary read on the rest of the data. Sure, you _could_ describe that as taking type {__iter__: ()->Iterable<String>, detach: ()->{read: ()->Iterable<Byte>}}, but, at least for purposes of human documentation (as opposed to, say, compiler type checking or optimization), it's much more useful to say it takes any TextIOBase. Of course you could further document that it only uses part of the TextIOBase interface (which adds helpful information because that allows me to give you a duck-typed file-like object without having to implement the whole ABC), but you'd still want to start with the TextIOBase if you want humans to understand your code.

And anyway, if you don't want to think of Python as having subtyping at all, then you're obviously not going to implement subtyping with inheritance in Python, so I have nothing to convince you of.

* Briefly, they're forgetting the fact that both Haskell and Python—and many other dynamic languages, but no other remotely mainstream static language—allow you to write functions and data structures that are not expressible at all in ML, Java, etc. The fact that Haskell can catch errors at compile time, rather than runtime, and optimize at compile time rather than just by tracing-JIT (which works OK for time optimization, not so much for space…), etc. is definitely an extra feature on top of that, which Python doesn't have. But it doesn't add any expressiveness that Python doesn't have. (Now, a language that had a sufficiently powerful type system plus overloading might be able to express things that Python can't, but so far, C++ and Swift are the closest any mainstream languages have come to that, and they don't come very close.) Anyway, I can write further on this if anyone is interested and/or shaking their heads in anger and disbelief.

Not necessarily beautiful, but mutated

Getting back to the LSP, the novel idea that makes it so important is that it applies to mutable objects.

After all, for immutable objects, subtyping is simple: A is a subtype of B if, for every attribute in B, A has an attribute with the same name whose type is a subtype. For methods, being a subtype means all arguments* are supertypes, and the return value is a subtype. There are a few complexities for more complicated signatures like those Python allows, but, while they're hard to get down rigorously, they're intuitively obvious.**

But mutation makes everything trickier. First, for A to be a subtype of B, every directly mutable attribute must be of the same type, not a subtype, because you have to be able to assign the same values. And, while methods are obviously not usually mutable, they can be mutating, and a mutating method has to be able to mutate the object in the same way.***

The (in)famous example here is Circle and Ellipse. One of the standard examples for OO design is that a Circle is-a Ellipse whose width and height are always the same. Any non-mutating methods on an Ellipse will work perfectly well on a Circle. But if width is mutable, or there's a method that sets the width independently from the height, a Circle no longer is-a Ellipse. (If this isn't clear, consider an immutable stretched(aspect) method that returns a new Ellipse stretched to the given aspect ratio, vs. a mutable stretch(aspect) method that stretches self to the given aspect ratio. Obviously, Circle can implement the first one just fine, but can't implement the second.****)

For a more common real-life example, consider a sorted sequence—something that acts like a tuple, but always has all of its elements in sorted order. A sorted tuple can be do anything a tuple can do, and can be used anywhere a tuple can be used, but it can also do other things, like find all values within a certain range.***** But now consider a mutable sorted sequence—something that acts like a list, but always has all of its elements in sorted order. This is not a subtype of a mutable sequence (although it is still a subtype of sequence, and sorted sequence). The most basic mutable sequence operation is seq[idx] = value, with the postcondition that seq[idx] == value; a mutable sorted sequence can't provide that. Or consider calling seq.sort(key=different_key). In fact, almost any mutating sequence method makes no sense for a sorted sequence.

* Except for self, of course. Or the equivalent. Which can get pretty complicated when you're dealing with multiple dispatch—including the ad hoc multiple dispatch that Python's operator protocols provide—but let's not get into that here.

** For example, think about optional arguments: (int, int, int=0)->int is obviously a subtype of (int, int)->int.

*** To make that rigorous, you generally put it in terms of how subtyping handles either preconditions and postconditions, or invariants (or both), plus a history constraint.

**** Unless you incorporate type-changing into your language's OO model. In that case, if you define Circle such that it's no longer an invariant that every mutation leave it a Circle, only that every mutation leave it an Ellipse, then a Circle is-a Ellipse again. I believe people actually have written serious code like this in Lisp, and Python allows it as well (normal classes allow you to mutate the __class__ attribute on an instance), but I don't know of anyone taking advantage of it.

***** Well, a tuple can do that, it just takes linear time instead of logarithmic. And, if tuple slicing returned sub-views instead of copies, a sorted tuple could also return a view for this value-slicing operation, while a regular tuple would have to return a copy. That raises the whole question of to whether, how, and/or to what extent performance guarantees are part of a type, which I definitely don't want to get into here.

So what is inheritance for in Python?

Subtyping

Again, ABCs aren't completely useless for subtyping, and there are some cases where isinstance tests make sense.

And, to add to what I said before, they help document the requirements of a group of related functions, which would be handy even if they didn't help to check those requirements. Consider the C++ standard library, and its predecessor the STL: it relies on clearly-defined "concepts" that specify a number of abstract types, to allow pieces to be connected up without being coupled together, but the language has no support for putting the concepts into the actual implementation. (It came close to getting such support in C++11, and probably will have something eventually…) The fact that Python's type system is powerful enough to represent its concepts can't possibly hurt; it means you can inherit your class from collections.abc.Sequence, or register with it, instead of just adding a comment saying "# this class acts like a sequence". It's obvious, easy to spot, and even introspectable (which can be a pretty important thing for a language that's built around experimenting with components in a REPL). And within the ABCs themselves, inheritance is similarly useful. A Sequence is literally defined as a Sized, Iterable Container with some additional methods, not just in the documentation, but in the code.

Hooks

Often, the simplest way to provide some library functionality is to write the main code and provide a series of hooks for application developers to override whichever parts they want. And one obvious way to allow people to override code is method overriding.

For example, look at socketserver. A TCPServer object handles all the basic functionality of listening on a server socket, accepting client sockets, and processing their requests. There are a number of different levels at which the application can hook that processing, but generally you do it by subclassing BaseRequestHandler and overriding one or more of its methods. Or you can use an HTTPServer, which hooks TCPServer for you, and then subclass BaseHTTPRequestHandler and override some of its methods instead.

Of course the same thing could be done with a more functional hook interface—instead of subclassing BaseRequestHandler and overriding its handle method, you'd create a handle function that takes some kind of request parameter, then call set_handle_hook(my_handle_function). When the hook functions are all stateless or independent, this is identical, but when they have state that may be shared between them, the obvious place to put that state is in the self parameter—basically, the same reason you sometimes want to use objects instead of functions with an explicit cookie argument (or closures) in the first place.

Mixins (implementation sharing)

When you're learning OO theory, one of the first things you learn is that inheritance is for sharing interfaces, not implementation. But in many modern languages, that isn't true: you often want to build a class by composing behavior from multiple mixin or trait classes. Some languages have special syntax to do this, but in a language with proper multiple inheritance like Python, you can do this just by inheriting from mixins the same way you inherit from supertypes. For example, if you want to make a SocketServer concurrent, you can just inherit from ThreadingMixIn or ForkingMixIn as well as SocketServer (or write your own concurrency mixin that works differently).

And there's nothing stopping a class from being an interface and a mixin at the same time. In fact, this is common and idiomatic in Python. For example, most of the collection ABCs are written this way—on the one hand, they're a way to check types nominally, but on the other hand, they make it easier to implement structural subtypes. For example, you implement __getitem__, and the Sequence mixing automatically implements __iter__ for you. This isn't always the simplest way to go in Python (the total_ordering decorator shows another approach), but it's often a good design.

OO purists will argue that this is not proper OO—but again, practicality beats purity. Compare the various file-like objects in the stdlib, such as the one returned by socket.makefile, in 3.4 vs. 2.5. The modern version is almost always simpler and shorter, and it replaces the ad-hoc and inconsistent notion of "file-like object" with a set of well-specified (and nominally-checkable) interfaces.
1

View comments

It's been more than a decade since Typical Programmer Greg Jorgensen taught the word about Abject-Oriented Programming.

Much of what he said still applies, but other things have changed.
I haven't posted anything new in a couple years (partly because I attempted to move to a different blogging platform where I could write everything in markdown instead of HTML but got frustrated—which I may attempt again), but I've had a few private comments and emails on some of the old posts, so I
Looking before you leap

Python is a duck-typed language, and one where you usually trust EAFP ("Easier to Ask Forgiveness than Permission") over LBYL ("Look Before You Leap").
Background

Currently, CPython’s internal bytecode format stores instructions with no args as 1 byte, instructions with small args as 3 bytes, and instructions with large args as 6 bytes (actually, a 3-byte EXTENDED_ARG followed by a 3-byte real instruction).
If you want to skip all the tl;dr and cut to the chase, jump to Concrete Proposal.
Many people, when they first discover the heapq module, have two questions:

Why does it define a bunch of functions instead of a container type? Why don't those functions take a key or reverse parameter, like all the other sorting-related stuff in Python? Why not a type?

At the abstract level, it'
Currently, in CPython, if you want to process bytecode, either in C or in Python, it’s pretty complicated.

The built-in peephole optimizer has to do extra work fixing up jump targets and the line-number table, and just punts on many cases because they’re too hard to deal with.
One common "advanced question" on places like StackOverflow and python-list is "how do I dynamically create a function/method/class/whatever"? The standard answer is: first, some caveats about why you probably don't want to do that, and then an explanation of the various ways to do it when you reall
A few years ago, Cesare di Mauro created a project called WPython, a fork of CPython 2.6.4 that “brings many optimizations and refactorings”. The starting point of the project was replacing the bytecode with “wordcode”. However, there were a number of other changes on top of it.
Many languages have a for-each loop.
When the first betas for Swift came out, I was impressed by their collection design. In particular, the way it allows them to write map-style functions that are lazy (like Python 3), but still as full-featured as possible.
In a previous post, I explained in detail how lookup works in Python.
The documentation does a great job explaining how things normally get looked up, and how you can hook them.

But to understand how the hooking works, you need to go under the covers to see how that normal lookup actually happens.

When I say "Python" below, I'm mostly talking about CPython 3.5.
In Python (I'm mostly talking about CPython here, but other implementations do similar things), when you write the following:

def spam(x): return x+1 spam(3) What happens?

Really, it's not that complicated, but there's no documentation anywhere that puts it all together.
I've seen a number of people ask why, if you can have arbitrary-sized integers that do everything exactly, you can't do the same thing with floats, avoiding all the rounding problems that they keep running into.
In a recent thread on python-ideas, Stephan Sahm suggested, in effect, changing the method resolution order (MRO) from C3-linearization to a simple depth-first search a la old-school Python or C++.
Note: This post doesn't talk about Python that much, except as a point of comparison for JavaScript.

Most object-oriented languages out there, including Python, are class-based. But JavaScript is instead prototype-based.
About a year and a half ago, I wrote a blog post on the idea of adding pattern matching to Python.

I finally got around to playing with Scala semi-seriously, and I realized that they pretty much solved the same problem, in a pretty similar way to my straw man proposal, and it works great.
About a year ago, Jules Jacobs wrote a series (part 1 and part 2, with part 3 still forthcoming) on the best collections library design.
In three separate discussions on the Python mailing lists this month, people have objected to some design because it leaks something into the enclosing scope. But "leaks into the enclosing scope" isn't a real problem.
There's a lot of confusion about what the various kinds of things you can iterate over in Python. I'll attempt to collect definitions for all of the relevant terms, and provide examples, here, so I don't have to go over the same discussions in the same circles every time.
Python has a whole hierarchy of collection-related abstract types, described in the collections.abc module in the standard library. But there are two key, prototypical kinds. Iterators are one-shot, used for a single forward traversal, and usually lazy, generating each value on the fly as requested.
There are a lot of novice questions on optimizing NumPy code on StackOverflow, that make a lot of the same mistakes. I'll try to cover them all here.

What does NumPy speed up?

Let's look at some Python code that does some computation element-wise on two lists of lists.
When asyncio was first proposed, many people (not so much on python-ideas, where Guido first suggested it, but on external blogs) had the same reaction: Doing the core reactor loop in Python is going to be way too slow. Something based on libev, like gevent, is inherently going to be much faster.
Let's say you have a good idea for a change to Python.
There are hundreds of questions on StackOverflow that all ask variations of the same thing. Paraphrasing:

lst is a list of strings and numbers. I want to convert the numbers to int but leave the strings alone.
In Haskell, you can section infix operators. This is a simple form of partial evaluation. Using Python syntax, the following are equivalent:

(2*) lambda x: 2*x (*2) lambda x: x*2 (*) lambda x, y: x*y So, can we do the same in Python?

Grammar

The first form, (2*), is unambiguous.
Many people—especially people coming from Java—think that using try/except is "inelegant", or "inefficient". Or, slightly less meaninglessly, they think that "exceptions should only be for errors, not for normal flow control".

These people are not going to be happy with Python.
If you look at Python tutorials and sample code, proposals for new language features, blogs like this one, talks at PyCon, etc., you'll see spam, eggs, gouda, etc. all over the place.
Most control structures in most most programming languages, including Python, are subordinating conjunctions, like "if", "while", and "except", although "with" is a preposition, and "for" is a preposition used strangely (although not as strangely as in C…).
There are two ways that some Python programmers overuse lambda. Doing this almost always mkes your code less readable, and for no corresponding benefit.
Some languages have a very strong idiomatic style—in Python, Haskell, or Swift, the same code by two different programmers is likely to look a lot more similar than in Perl, Lisp, or C++.

There's an advantage to this—and, in particular, an advantage to you sticking to those idioms.
Python doesn't have a way to clone generators.

At least for a lot of simple cases, however, it's pretty obvious what cloning them should do, and being able to do so would be handy. But for a lot of other cases, it's not at all obvious.
Every time someone has a good idea, they believe it should be in the stdlib. After all, it's useful to many people, and what's the harm? But of course there is a harm.
This confuses every Python developer the first time they see it—even if they're pretty experienced by the time they see it:

>>> t = ([], []) >>> t[0] += [1] --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <stdin> in <module>()
Blog Archive
About Me
About Me
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.