Note: This post doesn't talk about Python that much, except as a point of comparison for JavaScript.

Most object-oriented languages out there, including Python, are class-based. But JavaScript is instead prototype-based. Over the years, this has led to a lot of confusion, and even more attempts to resolve that confusion, either by faking classes, or by explaining why prototypes are good. Unfortunately, most of the latter attempts are completely wrong, and only add to the confusion.

A mess?

I'm going to pick on Kyle Simpson's JS Objects: Inherited a Mess (and its two followups)—not because it's particularly bad, but for the opposite reason: it's one of the best-written attempts at clearing things up (and it's frequently linked to).

Simpson's main point is that, ultimately, prototype inheritance is more like delegation than like inheritance, because the inheritance "arrows" go in the opposite direction.

Let's take a language like C++, and write a class Foo, a subclass Bar, and a Bar instance bar1. There's a superclass-to-subclass arrow pointing from Foo to Bar, which represents Bar's vtable being made by copying Foo's, and a class-to-instance arrow pointing from Bar to bar1, representing bar1's vptr being copied from Bar.

In JavaScript, on the other hand, there's no such copying; instead, there are live links. Methods are looked up on the object; if not found, the link is followed to look up the method on its prototype, and so on up the chain. So, he draws an arrow from bar1 to Bar.prototype, which its prototype link, and an arrow from Bar.prototype to Foo.prototype, representing the same thing.

But he's drawing entirely the wrong distinction. When you notice that C++, Java, and C# all do things one way, but JavaScript does things differently, it's a decent guess that the difference is prototypes vs. classes. But as soon as you toss in a few other languages, it's obvious that the guess is wrong. For example, Python and Smalltalk do things the same way as JavaScript, but they're very definitely class-based languages.

Simpson brings up some other differences between JS and other languages in the rest of the series--e.g., the fact that a JS object's constructor doesn't automatically call its prototype's constructor, while a C++ class's constructor does automatically call its base classes' constructors. But here again, Python does the same thing as JS. At any rate, the direction of the arrows is what he focuses on, so let's stick with that.

Dynamic vs. semi-dynamic lookup

If this difference isn't about prototypes vs. classes, what is it?

In Python, method lookup is dynamic. If your source code has a call to foo.spam, the compiler turns that into code that does the following, until one of them works:*
  • Ask the object to handle "spam".
  • Ask the object's __class__ to handle "spam".
  • Walk the class's superclass tree (in C3-linearized order), asking each one in turn to handle "spam".
  • Raise an AttributeError.
In JavaScript, method lookup is dynamic in exactly the same way. The compiler turns a call to foo.spam into this:
  • Ask the object to handle "spam".
  • Walk the object's prototype list, asking each one in turn to handle "spam".
  • Raise a TypeError.
The only difference is that Python has two kinds of links--instance-to-class and class-to-superclass--that JavaScript collapses into one--object-to-prototype. (And that Python has a linearized multiple inheritance tree, but let's ignore that here.)

In C++, method lookup is mostly static, with a hook for a very specific kind of dynamism. The compiler turns a call to foo.spam into this:**
  • Look up foo.vptr[2] and call it.
The compiler has to know that foo is (a reference to) an instance of Foo or some subclass thereof, and it has to have compiled Foo already so that it knows that spam is its third method. The only way anyone outside of Foo can affect what happens here is that a subclass like Bar can override the spam method, so its vtable ends up with a different function pointer in the third slot. If so, if foo is a Bar instance at runtime, then foo.vptr[2] will point to Bar.spam instead of Foo.spam. That's all the late binding you get.

The downsides to C++-style virtual methods instead of fully dynamic lookup are pretty obvious. In JavaScript or Python, you can set a new foo.spam at any time, overriding the behavior of Foo. Or you can set a new Foo.spam, which will affect all existing instances of Foo or its descendants that don't have any overriding implementation. This can be pretty handy for, e.g., building up transparent proxies based on runtime information, or mocking objects for unit tests.

But you don't do those things that often. Pretty often, C++'s dynamic behavior is all the dynamic behavior you need. And it's obviously simpler, and a lot faster, to just follow a pointer, index an array, and follow another pointer than to walk through a linked chain of objects asking each one to do a dictionary lookup for you. And it even opens the door to further optimizations--e.g., if the compiler knows that foo must be exactly a Foo rather than a subclass (e.g., because it's a local allocated on the stack), or it knows that no subclass of Foo overrides the method (e.g., because it's got feedback from a whole-program analyzer), it can just insert a call directly to the Foo.spam implementation.

The problem is that something that works "pretty often" but has no escape hatch only works pretty often. And just as often, the speed of the lookups don't matter anyway. As Alan Kay pointed out back in the 70s, usually you're either doing all your work inside some inner loop so it hardly matters how fast you get to that loop, or spending all your time waiting on a user to interact with the GUI/a client to interact with the reactor/etc. so it hardly matters how fast you do anything. Better to give people features they might need than performance they probably don't. So, that's what Smalltalk did. And Objective-C, Python, Ruby, etc.

* Python offers a wide variety of ways to hook this process, as do many other dynamic OO languages, so this is only the default behavior. But let's not worry about that right now.
** I'm ignoring non-virtual functions here, but they don't add any complexity. I'm also ignoring the screwy way that C++ does multiple inheritance. Be glad. I'm also hiding reference/pointer stuff, which means this is closer to C# or Java than C++, but that doesn't really matter here.

The best of both worlds?

A Smalltalk offshoot called Self was built around the idea that a just-in-time (JIT) optimizer can eliminate most of the costs of dynamic lookup, while preserving all the benefits. It's pretty uncommon that you change out a method at runtime, and very rare that you do so more than once or twice in the run of a program. So, if you cache the lookup and recompile any function that needs the lookup to use the cached value, and insert some guards to invalidate the cache on the rare occasions when you change something, you may end up with millions or billions of faster-than-C++ lookups for every one recompile and slow lookup.

It's not coincidental that Self is also the language that popularized prototype-based inheritance. One major thing classes add in dynamic inheritance is that making that first link special makes it easier to optimize. Also, classes of objects tend to almost always override the same set of methods in the same way, so you might as well label those classes to help the optimizer. And if the language encourages you to think in terms of classes of objects, you'll usually make the optimizer's job easier. But with a smart-enough JIT, that isn't necessary. At any rate, the dynamic lookup was the main reason for the JIT, and that reason applies to dynamic class-based languages just as much as to prototype-based ones.

Plus, of course, it turns out that the idea of a JIT worked pretty well on static code too; Hotshot running compiled Java bytecode is generally faster than PyPy or V8 running Python or JavaScript. But for most uses, PyPy and V8 are more than fast enough. (In fact, for many uses, CPython and Spidermonkey are more than fast enough.)

What's in a name?

Simpson wants to insist that JavaScript's behavior is not dynamic inheritance, it's delegation.

But, if so, Python, Smalltalk, and many other class-based languages aren't doing dynamic inheritance either. So, the difference between class-based and prototype-based still has nothing to do with delegation vs. inheritance, even if you use his terms.

In fact, what he's calling "delegation" really just means dynamic inheritance. Which makes it obvious why claiming that JavaScript doesn't do dynamic inheritance, it does delegation, is wrong.

While we're on the subject of names, you could argue that the confusion about prototypes is from the start caused by other things JS got wrong. Other prototype-based languages, from Self to newer languages like IO, don't use constructor functions and a Java-like new operator, so they don't make prototyping/cloning look misleadingly like instantiation. Maybe if JS didn't do that, people wouldn't have been misled as badly in the first place?

Prototypes

So then, what is the difference between classes and prototypes?

Of course there's the minor difference noted above, that JavaScript only has one kind of link instead of two.* And, if you read the footnotes, there could be a performance difference. But... so what?

Well, getting rid of classes does lose something: In JavaScript, your prototype chain doesn't affect your type. In Python, because the first step of that chain is special, the language can define its notion of type in terms of that link. Of course 95% of your code is duck-typed and EAFP-driven and this doesn't matter. But for that last 5% (and for debugging), you have a coherent and useful notion of type, which can help.

Also, making multiple inheritance work with classes is not that hard (even though C++ gets it wrong, Java punts on it, etc.); making it work with prototypes is harder.

The biggest downside is that cooperative inheritance through a super-like mechanism is hard with prototypes. In particular, composing mixins that don't know about each other is a serious problem. Simpson gets into this in his "Polymorphism redux" section in his third part. But I believe this is actually only a problem if you use the constructor idiom instead of using prototype-chained object literals, so a prototype-based language that didn't encourage the former and make the latter more verbose than necessary may not have this problem.

Anyway, what do you gain in return for these negatives? One thing (although not the motivation behind Self**) is flexible object literals. In JavaScript, creating an object is just like creating a dictionary. If some of the members are callable, they're methods. Sure, this is just syntactic sugar, but it's syntactic sugar around something you often want.

At first glance, that sounds horrible. We have 30000 objects, all anonymous, all locally created in different calls to the same function and therefore closing over different scopes… so much duplication! But again, you're thinking about ahead-of-time optimization; there's no reason the interpreter can't realize that all of those closures have the same body, and guardably-almost-always the same behavior, so with a JIT, you're wasting very little space and even less time. And, even if that weren't true, it probably wouldn't matter anyway. Meanwhile, if you don't want that duplication, you can always move some of those functions into a prototype—exactly the same way as you'd move them into a class in Python.

At second glance, it sounds horrible again. Everyone knows that closures and objects are dual—one is the functional way of encapsulating state, the other is the OO way. So, who wants closures inside objects? But here... well, a Python developer shouldn't be surprised that sometimes practicality beats purity, and adding boilerplate can hurt readability. For a locally-created object that just has a bunch of simple functions that all refer to the local variable username, does it actually make things more readable to create and call a constructor that copies username to self.username and make all those functions call self.username, or is that just boilerplate?

(In fact, you can go farther and explicitly link up multiple closures through manual delegation chains. But I haven't seen many uses of inheritance chains beyond 2-deep in JavaScript, and nearly all of those that I've seen are equivalent to what you'd do in a class-based language. So, let's just stick with the simple case here.)

Really, it's similar to def vs. lambda: for most functions, def is clearly better, but that doesn't mean there's never a good use for lambda. Nobody needs a separate statement, and an explicit name, for the function lambda x: x+1 or lambda cb: cb(result). And similarly, for most objects, using methods in the class or prototype to encapsulate the object's state is clearly better, but that doesn't mean there's never a good use for object literals with object-specific methods.

The big difference is that in JavaScript, both ways are easy; in Python, both ways are possible, but only one is easy—there's no clean and simple way to put the behavior in each object.

Well, that's not quite true; JS's horrible scoping rules and its quirky this mean both ways are actually not easy. But that's not because JS is prototype-based; it's because JS got some other things wrong.

Anyway, the differences in this section are what really matter—and they're the ones that are down to JS being prototype-based.

* Or, if you count metaclass-class as different from class-instance, one instead of three. Simulating metaclasses in a prototype language usually means making use of the distinction between calling the prototype vs. just referencing its .prototype, in JS terms, and there's nothing to protect you from mixing up your "metaprototypes" and prototypes. But then Python has nothing to protect you from inheriting from a metaclass as if it were a normal class either, and that never causes problems in real-life code.
** The inspiration behind prototypes in Self was that it's often hard to define a class hierarchy in advance, and too hard to modify a class hierarchy after you've already written a bunch of code around it. Slightly oversimplifying, prototypes were mainly an attempt to solve the fragile base class problem.

The best of both... wait, deja-vu

Some newer class-based languages, like Scala, have an object-literal and/or singleton-defining syntax. (And it's not that hard to add that to Python, if you want.) If you combine this with a few other features, you can get objects that act like instances and clones at the same time (and hopefully in the ways you want, not the ways you don't).

On the other hand, JS (ES6) is adding classes for when you want them, but object literals will still an anonymously-typed clone of an optionally-specific prototype rather than an instance of a class.

Are we meeting in the middle? I'm not sure.
1

View comments

It's been more than a decade since Typical Programmer Greg Jorgensen taught the word about Abject-Oriented Programming.

Much of what he said still applies, but other things have changed. Languages in the Abject-Oriented space have been borrowing ideas from another paradigm entirely—and then everyone realized that languages like Python, Ruby, and JavaScript had been doing it for years and just hadn't noticed (because these languages do not require you to declare what you're doing, or even to know what you're doing). Meanwhile, new hybrid languages borrow freely from both paradigms.

This other paradigm—which is actually older, but was largely constrained to university basements until recent years—is called Functional Addiction.
5

I haven't posted anything new in a couple years (partly because I attempted to move to a different blogging platform where I could write everything in markdown instead of HTML but got frustrated—which I may attempt again), but I've had a few private comments and emails on some of the old posts, so I decided to do some followups.

A couple years ago, I wrote a blog post on greenlets, threads, and processes.
6

Looking before you leap

Python is a duck-typed language, and one where you usually trust EAFP ("Easier to Ask Forgiveness than Permission") over LBYL ("Look Before You Leap"). In Java or C#, you need "interfaces" all over the place; you can't pass something to a function unless it's an instance of a type that implements that interface; in Python, as long as your object has the methods and other attributes that the function needs, no matter what type it is, everything is good.
1

Background

Currently, CPython’s internal bytecode format stores instructions with no args as 1 byte, instructions with small args as 3 bytes, and instructions with large args as 6 bytes (actually, a 3-byte EXTENDED_ARG followed by a 3-byte real instruction). While bytecode is implementation-specific, many other implementations (PyPy, MicroPython, …) use CPython’s bytecode format, or variations on it.

Python exposes as much of this as possible to user code.
6

If you want to skip all the tl;dr and cut to the chase, jump to Concrete Proposal.

Why can’t we write list.len()? Dunder methods C++ Python Locals What raises on failure? Method objects What about set and delete? Data members Namespaces Bytecode details Lookup overrides Introspection C API Concrete proposal CPython Analysis

Why can’t we write list.len()?

Python is an OO language. To reverse a list, you call lst.reverse(); to search a list for an element, you call lst.index().
8

Many people, when they first discover the heapq module, have two questions:

Why does it define a bunch of functions instead of a container type? Why don't those functions take a key or reverse parameter, like all the other sorting-related stuff in Python? Why not a type?

At the abstract level, it's often easier to think of heaps as an algorithm rather than a data structure.
1

Currently, in CPython, if you want to process bytecode, either in C or in Python, it’s pretty complicated.

The built-in peephole optimizer has to do extra work fixing up jump targets and the line-number table, and just punts on many cases because they’re too hard to deal with. PEP 511 proposes a mechanism for registering third-party (or possibly stdlib) optimizers, and they’ll all have to do the same kind of work.
3

One common "advanced question" on places like StackOverflow and python-list is "how do I dynamically create a function/method/class/whatever"? The standard answer is: first, some caveats about why you probably don't want to do that, and then an explanation of the various ways to do it when you really do need to.

But really, creating functions, methods, classes, etc. in Python is always already dynamic.

Some cases of "I need a dynamic function" are just "Yeah? And you've already got one".
1

A few years ago, Cesare di Mauro created a project called WPython, a fork of CPython 2.6.4 that “brings many optimizations and refactorings”. The starting point of the project was replacing the bytecode with “wordcode”. However, there were a number of other changes on top of it.

I believe it’s possible that replacing the bytecode with wordcode would be useful on its own.
1

Many languages have a for-each loop. In some, like Python, it’s the only kind of for loop:

for i in range(10): print(i) In most languages, the loop variable is only in scope within the code controlled by the for loop,[1] except in languages that don’t have granular scopes at all, like Python.[2]

So, is that i a variable that gets updated each time through the loop or is it a new constant that gets defined each time through the loop?

Almost every language treats it as a reused variable.
4
Blog Archive
About Me
About Me
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.