Most object-oriented languages out there, including Python, are class-based. But JavaScript is instead prototype-based. Over the years, this has led to a lot of confusion, and even more attempts to resolve that confusion, either by faking classes, or by explaining why prototypes are good. Unfortunately, most of the latter attempts are completely wrong, and only add to the confusion.
A mess?
I'm going to pick on Kyle Simpson's JS Objects: Inherited a Mess (and its two followups)—not because it's particularly bad, but for the opposite reason: it's one of the best-written attempts at clearing things up (and it's frequently linked to).Simpson's main point is that, ultimately, prototype inheritance is more like delegation than like inheritance, because the inheritance "arrows" go in the opposite direction.
Let's take a language like C++, and write a class
Foo
, a subclass Bar
, and a Bar instance bar1
. There's a superclass-to-subclass arrow pointing from Foo
to Bar
, which represents Bar
's vtable being made by copying Foo
's, and a class-to-instance arrow pointing from Bar
to bar1
, representing bar1
's vptr being copied from Bar
.In JavaScript, on the other hand, there's no such copying; instead, there are live links. Methods are looked up on the object; if not found, the link is followed to look up the method on its prototype, and so on up the chain. So, he draws an arrow from
bar1
to Bar.prototype
, which its prototype link, and an arrow from Bar.prototype
to Foo.prototype
, representing the same thing.But he's drawing entirely the wrong distinction. When you notice that C++, Java, and C# all do things one way, but JavaScript does things differently, it's a decent guess that the difference is prototypes vs. classes. But as soon as you toss in a few other languages, it's obvious that the guess is wrong. For example, Python and Smalltalk do things the same way as JavaScript, but they're very definitely class-based languages.
Simpson brings up some other differences between JS and other languages in the rest of the series--e.g., the fact that a JS object's constructor doesn't automatically call its prototype's constructor, while a C++ class's constructor does automatically call its base classes' constructors. But here again, Python does the same thing as JS. At any rate, the direction of the arrows is what he focuses on, so let's stick with that.
Dynamic vs. semi-dynamic lookup
If this difference isn't about prototypes vs. classes, what is it?In Python, method lookup is dynamic. If your source code has a call to
foo.spam
, the compiler turns that into code that does the following, until one of them works:*- Ask the object to handle "spam".
- Ask the object's
__class__
to handle "spam". - Walk the class's superclass tree (in C3-linearized order), asking each one in turn to handle "spam".
- Raise an
AttributeError
.
foo.spam
into this:- Ask the object to handle "spam".
- Walk the object's prototype list, asking each one in turn to handle "spam".
- Raise a
TypeError
.
In C++, method lookup is mostly static, with a hook for a very specific kind of dynamism. The compiler turns a call to
foo.spam
into this:**- Look up
foo.vptr[2]
and call it.
foo
is (a reference to) an instance of Foo
or some subclass thereof, and it has to have compiled Foo
already so that it knows that spam
is its third method. The only way anyone outside of Foo
can affect what happens here is that a subclass like Bar
can override the spam
method, so its vtable ends up with a different function pointer in the third slot. If so, if foo
is a Bar
instance at runtime, then foo.vptr[2]
will point to Bar.spam
instead of Foo.spam
. That's all the late binding you get.The downsides to C++-style virtual methods instead of fully dynamic lookup are pretty obvious. In JavaScript or Python, you can set a new
foo.spam
at any time, overriding the behavior of Foo
. Or you can set a new Foo.spam
, which will affect all existing instances of Foo
or its descendants that don't have any overriding implementation. This can be pretty handy for, e.g., building up transparent proxies based on runtime information, or mocking objects for unit tests.But you don't do those things that often. Pretty often, C++'s dynamic behavior is all the dynamic behavior you need. And it's obviously simpler, and a lot faster, to just follow a pointer, index an array, and follow another pointer than to walk through a linked chain of objects asking each one to do a dictionary lookup for you. And it even opens the door to further optimizations--e.g., if the compiler knows that
foo
must be exactly a Foo
rather than a subclass (e.g., because it's a local allocated on the stack), or it knows that no subclass of Foo
overrides the method (e.g., because it's got feedback from a whole-program analyzer), it can just insert a call directly to the Foo.spam
implementation.The problem is that something that works "pretty often" but has no escape hatch only works pretty often. And just as often, the speed of the lookups don't matter anyway. As Alan Kay pointed out back in the 70s, usually you're either doing all your work inside some inner loop so it hardly matters how fast you get to that loop, or spending all your time waiting on a user to interact with the GUI/a client to interact with the reactor/etc. so it hardly matters how fast you do anything. Better to give people features they might need than performance they probably don't. So, that's what Smalltalk did. And Objective-C, Python, Ruby, etc.
* Python offers a wide variety of ways to hook this process, as do many other dynamic OO languages, so this is only the default behavior. But let's not worry about that right now.
** I'm ignoring non-virtual functions here, but they don't add any complexity. I'm also ignoring the screwy way that C++ does multiple inheritance. Be glad. I'm also hiding reference/pointer stuff, which means this is closer to C# or Java than C++, but that doesn't really matter here.
The best of both worlds?
A Smalltalk offshoot called Self was built around the idea that a just-in-time (JIT) optimizer can eliminate most of the costs of dynamic lookup, while preserving all the benefits. It's pretty uncommon that you change out a method at runtime, and very rare that you do so more than once or twice in the run of a program. So, if you cache the lookup and recompile any function that needs the lookup to use the cached value, and insert some guards to invalidate the cache on the rare occasions when you change something, you may end up with millions or billions of faster-than-C++ lookups for every one recompile and slow lookup.It's not coincidental that Self is also the language that popularized prototype-based inheritance. One major thing classes add in dynamic inheritance is that making that first link special makes it easier to optimize. Also, classes of objects tend to almost always override the same set of methods in the same way, so you might as well label those classes to help the optimizer. And if the language encourages you to think in terms of classes of objects, you'll usually make the optimizer's job easier. But with a smart-enough JIT, that isn't necessary. At any rate, the dynamic lookup was the main reason for the JIT, and that reason applies to dynamic class-based languages just as much as to prototype-based ones.
Plus, of course, it turns out that the idea of a JIT worked pretty well on static code too; Hotshot running compiled Java bytecode is generally faster than PyPy or V8 running Python or JavaScript. But for most uses, PyPy and V8 are more than fast enough. (In fact, for many uses, CPython and Spidermonkey are more than fast enough.)
What's in a name?
Simpson wants to insist that JavaScript's behavior is not dynamic inheritance, it's delegation.But, if so, Python, Smalltalk, and many other class-based languages aren't doing dynamic inheritance either. So, the difference between class-based and prototype-based still has nothing to do with delegation vs. inheritance, even if you use his terms.
In fact, what he's calling "delegation" really just means dynamic inheritance. Which makes it obvious why claiming that JavaScript doesn't do dynamic inheritance, it does delegation, is wrong.
While we're on the subject of names, you could argue that the confusion about prototypes is from the start caused by other things JS got wrong. Other prototype-based languages, from Self to newer languages like IO, don't use constructor functions and a Java-like
new
operator, so they don't make prototyping/cloning look misleadingly like instantiation. Maybe if JS didn't do that, people wouldn't have been misled as badly in the first place?Prototypes
So then, what is the difference between classes and prototypes?Of course there's the minor difference noted above, that JavaScript only has one kind of link instead of two.* And, if you read the footnotes, there could be a performance difference. But... so what?
Well, getting rid of classes does lose something: In JavaScript, your prototype chain doesn't affect your type. In Python, because the first step of that chain is special, the language can define its notion of type in terms of that link. Of course 95% of your code is duck-typed and EAFP-driven and this doesn't matter. But for that last 5% (and for debugging), you have a coherent and useful notion of type, which can help.
Also, making multiple inheritance work with classes is not that hard (even though C++ gets it wrong, Java punts on it, etc.); making it work with prototypes is harder.
The biggest downside is that cooperative inheritance through a
super
-like mechanism is hard with prototypes. In particular, composing mixins that don't know about each other is a serious problem. Simpson gets into this in his "Polymorphism redux" section in his third part. But I believe this is actually only a problem if you use the constructor idiom instead of using prototype-chained object literals, so a prototype-based language that didn't encourage the former and make the latter more verbose than necessary may not have this problem.Anyway, what do you gain in return for these negatives? One thing (although not the motivation behind Self**) is flexible object literals. In JavaScript, creating an object is just like creating a dictionary. If some of the members are callable, they're methods. Sure, this is just syntactic sugar, but it's syntactic sugar around something you often want.
At first glance, that sounds horrible. We have 30000 objects, all anonymous, all locally created in different calls to the same function and therefore closing over different scopes… so much duplication! But again, you're thinking about ahead-of-time optimization; there's no reason the interpreter can't realize that all of those closures have the same body, and guardably-almost-always the same behavior, so with a JIT, you're wasting very little space and even less time. And, even if that weren't true, it probably wouldn't matter anyway. Meanwhile, if you don't want that duplication, you can always move some of those functions into a prototype—exactly the same way as you'd move them into a class in Python.
At second glance, it sounds horrible again. Everyone knows that closures and objects are dual—one is the functional way of encapsulating state, the other is the OO way. So, who wants closures inside objects? But here... well, a Python developer shouldn't be surprised that sometimes practicality beats purity, and adding boilerplate can hurt readability. For a locally-created object that just has a bunch of simple functions that all refer to the local variable
username
, does it actually make things more readable to create and call a constructor that copies username
to self.username
and make all those functions call self.username
, or is that just boilerplate?(In fact, you can go farther and explicitly link up multiple closures through manual delegation chains. But I haven't seen many uses of inheritance chains beyond 2-deep in JavaScript, and nearly all of those that I've seen are equivalent to what you'd do in a class-based language. So, let's just stick with the simple case here.)
Really, it's similar to
def
vs. lambda
: for most functions, def
is clearly better, but that doesn't mean there's never a good use for lambda
. Nobody needs a separate statement, and an explicit name, for the function lambda x: x+1
or lambda cb: cb(result)
. And similarly, for most objects, using methods in the class or prototype to encapsulate the object's state is clearly better, but that doesn't mean there's never a good use for object literals with object-specific methods.The big difference is that in JavaScript, both ways are easy; in Python, both ways are possible, but only one is easy—there's no clean and simple way to put the behavior in each object.
Well, that's not quite true; JS's horrible scoping rules and its quirky
this
mean both ways are actually not
easy. But that's not because JS is prototype-based; it's because JS got some other things wrong.Anyway, the differences in this section are what really matter—and they're the ones that are down to JS being prototype-based.
* Or, if you count metaclass-class as different from class-instance, one instead of three. Simulating metaclasses in a prototype language usually means making use of the distinction between calling the prototype vs. just referencing its
.prototype
, in JS terms, and there's nothing to protect you from mixing up your "metaprototypes" and prototypes. But then Python has nothing to protect you from inheriting from a metaclass as if it were a normal class either, and that never causes problems in real-life code.** The inspiration behind prototypes in Self was that it's often hard to define a class hierarchy in advance, and too hard to modify a class hierarchy after you've already written a bunch of code around it. Slightly oversimplifying, prototypes were mainly an attempt to solve the fragile base class problem.
The best of both... wait, deja-vu
Some newer class-based languages, like Scala, have an object-literal and/or singleton-defining syntax. (And it's not that hard to add that to Python, if you want.) If you combine this with a few other features, you can get objects that act like instances and clones at the same time (and hopefully in the ways you want, not the ways you don't).On the other hand, JS (ES6) is adding classes for when you want them, but object literals will still an anonymously-typed clone of an optionally-specific prototype rather than an instance of a class.
Are we meeting in the middle? I'm not sure.
View comments