Note: This post doesn't talk about Python that much, except as a point of comparison for JavaScript.

Most object-oriented languages out there, including Python, are class-based. But JavaScript is instead prototype-based. Over the years, this has led to a lot of confusion, and even more attempts to resolve that confusion, either by faking classes, or by explaining why prototypes are good. Unfortunately, most of the latter attempts are completely wrong, and only add to the confusion.

A mess?

I'm going to pick on Kyle Simpson's JS Objects: Inherited a Mess (and its two followups)—not because it's particularly bad, but for the opposite reason: it's one of the best-written attempts at clearing things up (and it's frequently linked to).

Simpson's main point is that, ultimately, prototype inheritance is more like delegation than like inheritance, because the inheritance "arrows" go in the opposite direction.

Let's take a language like C++, and write a class Foo, a subclass Bar, and a Bar instance bar1. There's a superclass-to-subclass arrow pointing from Foo to Bar, which represents Bar's vtable being made by copying Foo's, and a class-to-instance arrow pointing from Bar to bar1, representing bar1's vptr being copied from Bar.

In JavaScript, on the other hand, there's no such copying; instead, there are live links. Methods are looked up on the object; if not found, the link is followed to look up the method on its prototype, and so on up the chain. So, he draws an arrow from bar1 to Bar.prototype, which its prototype link, and an arrow from Bar.prototype to Foo.prototype, representing the same thing.

But he's drawing entirely the wrong distinction. When you notice that C++, Java, and C# all do things one way, but JavaScript does things differently, it's a decent guess that the difference is prototypes vs. classes. But as soon as you toss in a few other languages, it's obvious that the guess is wrong. For example, Python and Smalltalk do things the same way as JavaScript, but they're very definitely class-based languages.

Simpson brings up some other differences between JS and other languages in the rest of the series--e.g., the fact that a JS object's constructor doesn't automatically call its prototype's constructor, while a C++ class's constructor does automatically call its base classes' constructors. But here again, Python does the same thing as JS. At any rate, the direction of the arrows is what he focuses on, so let's stick with that.

Dynamic vs. semi-dynamic lookup

If this difference isn't about prototypes vs. classes, what is it?

In Python, method lookup is dynamic. If your source code has a call to foo.spam, the compiler turns that into code that does the following, until one of them works:*
  • Ask the object to handle "spam".
  • Ask the object's __class__ to handle "spam".
  • Walk the class's superclass tree (in C3-linearized order), asking each one in turn to handle "spam".
  • Raise an AttributeError.
In JavaScript, method lookup is dynamic in exactly the same way. The compiler turns a call to foo.spam into this:
  • Ask the object to handle "spam".
  • Walk the object's prototype list, asking each one in turn to handle "spam".
  • Raise a TypeError.
The only difference is that Python has two kinds of links--instance-to-class and class-to-superclass--that JavaScript collapses into one--object-to-prototype. (And that Python has a linearized multiple inheritance tree, but let's ignore that here.)

In C++, method lookup is mostly static, with a hook for a very specific kind of dynamism. The compiler turns a call to foo.spam into this:**
  • Look up foo.vptr[2] and call it.
The compiler has to know that foo is (a reference to) an instance of Foo or some subclass thereof, and it has to have compiled Foo already so that it knows that spam is its third method. The only way anyone outside of Foo can affect what happens here is that a subclass like Bar can override the spam method, so its vtable ends up with a different function pointer in the third slot. If so, if foo is a Bar instance at runtime, then foo.vptr[2] will point to Bar.spam instead of Foo.spam. That's all the late binding you get.

The downsides to C++-style virtual methods instead of fully dynamic lookup are pretty obvious. In JavaScript or Python, you can set a new foo.spam at any time, overriding the behavior of Foo. Or you can set a new Foo.spam, which will affect all existing instances of Foo or its descendants that don't have any overriding implementation. This can be pretty handy for, e.g., building up transparent proxies based on runtime information, or mocking objects for unit tests.

But you don't do those things that often. Pretty often, C++'s dynamic behavior is all the dynamic behavior you need. And it's obviously simpler, and a lot faster, to just follow a pointer, index an array, and follow another pointer than to walk through a linked chain of objects asking each one to do a dictionary lookup for you. And it even opens the door to further optimizations--e.g., if the compiler knows that foo must be exactly a Foo rather than a subclass (e.g., because it's a local allocated on the stack), or it knows that no subclass of Foo overrides the method (e.g., because it's got feedback from a whole-program analyzer), it can just insert a call directly to the Foo.spam implementation.

The problem is that something that works "pretty often" but has no escape hatch only works pretty often. And just as often, the speed of the lookups don't matter anyway. As Alan Kay pointed out back in the 70s, usually you're either doing all your work inside some inner loop so it hardly matters how fast you get to that loop, or spending all your time waiting on a user to interact with the GUI/a client to interact with the reactor/etc. so it hardly matters how fast you do anything. Better to give people features they might need than performance they probably don't. So, that's what Smalltalk did. And Objective-C, Python, Ruby, etc.

* Python offers a wide variety of ways to hook this process, as do many other dynamic OO languages, so this is only the default behavior. But let's not worry about that right now.
** I'm ignoring non-virtual functions here, but they don't add any complexity. I'm also ignoring the screwy way that C++ does multiple inheritance. Be glad. I'm also hiding reference/pointer stuff, which means this is closer to C# or Java than C++, but that doesn't really matter here.

The best of both worlds?

A Smalltalk offshoot called Self was built around the idea that a just-in-time (JIT) optimizer can eliminate most of the costs of dynamic lookup, while preserving all the benefits. It's pretty uncommon that you change out a method at runtime, and very rare that you do so more than once or twice in the run of a program. So, if you cache the lookup and recompile any function that needs the lookup to use the cached value, and insert some guards to invalidate the cache on the rare occasions when you change something, you may end up with millions or billions of faster-than-C++ lookups for every one recompile and slow lookup.

It's not coincidental that Self is also the language that popularized prototype-based inheritance. One major thing classes add in dynamic inheritance is that making that first link special makes it easier to optimize. Also, classes of objects tend to almost always override the same set of methods in the same way, so you might as well label those classes to help the optimizer. And if the language encourages you to think in terms of classes of objects, you'll usually make the optimizer's job easier. But with a smart-enough JIT, that isn't necessary. At any rate, the dynamic lookup was the main reason for the JIT, and that reason applies to dynamic class-based languages just as much as to prototype-based ones.

Plus, of course, it turns out that the idea of a JIT worked pretty well on static code too; Hotshot running compiled Java bytecode is generally faster than PyPy or V8 running Python or JavaScript. But for most uses, PyPy and V8 are more than fast enough. (In fact, for many uses, CPython and Spidermonkey are more than fast enough.)

What's in a name?

Simpson wants to insist that JavaScript's behavior is not dynamic inheritance, it's delegation.

But, if so, Python, Smalltalk, and many other class-based languages aren't doing dynamic inheritance either. So, the difference between class-based and prototype-based still has nothing to do with delegation vs. inheritance, even if you use his terms.

In fact, what he's calling "delegation" really just means dynamic inheritance. Which makes it obvious why claiming that JavaScript doesn't do dynamic inheritance, it does delegation, is wrong.

While we're on the subject of names, you could argue that the confusion about prototypes is from the start caused by other things JS got wrong. Other prototype-based languages, from Self to newer languages like IO, don't use constructor functions and a Java-like new operator, so they don't make prototyping/cloning look misleadingly like instantiation. Maybe if JS didn't do that, people wouldn't have been misled as badly in the first place?

Prototypes

So then, what is the difference between classes and prototypes?

Of course there's the minor difference noted above, that JavaScript only has one kind of link instead of two.* And, if you read the footnotes, there could be a performance difference. But... so what?

Well, getting rid of classes does lose something: In JavaScript, your prototype chain doesn't affect your type. In Python, because the first step of that chain is special, the language can define its notion of type in terms of that link. Of course 95% of your code is duck-typed and EAFP-driven and this doesn't matter. But for that last 5% (and for debugging), you have a coherent and useful notion of type, which can help.

Also, making multiple inheritance work with classes is not that hard (even though C++ gets it wrong, Java punts on it, etc.); making it work with prototypes is harder.

The biggest downside is that cooperative inheritance through a super-like mechanism is hard with prototypes. In particular, composing mixins that don't know about each other is a serious problem. Simpson gets into this in his "Polymorphism redux" section in his third part. But I believe this is actually only a problem if you use the constructor idiom instead of using prototype-chained object literals, so a prototype-based language that didn't encourage the former and make the latter more verbose than necessary may not have this problem.

Anyway, what do you gain in return for these negatives? One thing (although not the motivation behind Self**) is flexible object literals. In JavaScript, creating an object is just like creating a dictionary. If some of the members are callable, they're methods. Sure, this is just syntactic sugar, but it's syntactic sugar around something you often want.

At first glance, that sounds horrible. We have 30000 objects, all anonymous, all locally created in different calls to the same function and therefore closing over different scopes… so much duplication! But again, you're thinking about ahead-of-time optimization; there's no reason the interpreter can't realize that all of those closures have the same body, and guardably-almost-always the same behavior, so with a JIT, you're wasting very little space and even less time. And, even if that weren't true, it probably wouldn't matter anyway. Meanwhile, if you don't want that duplication, you can always move some of those functions into a prototype—exactly the same way as you'd move them into a class in Python.

At second glance, it sounds horrible again. Everyone knows that closures and objects are dual—one is the functional way of encapsulating state, the other is the OO way. So, who wants closures inside objects? But here... well, a Python developer shouldn't be surprised that sometimes practicality beats purity, and adding boilerplate can hurt readability. For a locally-created object that just has a bunch of simple functions that all refer to the local variable username, does it actually make things more readable to create and call a constructor that copies username to self.username and make all those functions call self.username, or is that just boilerplate?

(In fact, you can go farther and explicitly link up multiple closures through manual delegation chains. But I haven't seen many uses of inheritance chains beyond 2-deep in JavaScript, and nearly all of those that I've seen are equivalent to what you'd do in a class-based language. So, let's just stick with the simple case here.)

Really, it's similar to def vs. lambda: for most functions, def is clearly better, but that doesn't mean there's never a good use for lambda. Nobody needs a separate statement, and an explicit name, for the function lambda x: x+1 or lambda cb: cb(result). And similarly, for most objects, using methods in the class or prototype to encapsulate the object's state is clearly better, but that doesn't mean there's never a good use for object literals with object-specific methods.

The big difference is that in JavaScript, both ways are easy; in Python, both ways are possible, but only one is easy—there's no clean and simple way to put the behavior in each object.

Well, that's not quite true; JS's horrible scoping rules and its quirky this mean both ways are actually not easy. But that's not because JS is prototype-based; it's because JS got some other things wrong.

Anyway, the differences in this section are what really matter—and they're the ones that are down to JS being prototype-based.

* Or, if you count metaclass-class as different from class-instance, one instead of three. Simulating metaclasses in a prototype language usually means making use of the distinction between calling the prototype vs. just referencing its .prototype, in JS terms, and there's nothing to protect you from mixing up your "metaprototypes" and prototypes. But then Python has nothing to protect you from inheriting from a metaclass as if it were a normal class either, and that never causes problems in real-life code.
** The inspiration behind prototypes in Self was that it's often hard to define a class hierarchy in advance, and too hard to modify a class hierarchy after you've already written a bunch of code around it. Slightly oversimplifying, prototypes were mainly an attempt to solve the fragile base class problem.

The best of both... wait, deja-vu

Some newer class-based languages, like Scala, have an object-literal and/or singleton-defining syntax. (And it's not that hard to add that to Python, if you want.) If you combine this with a few other features, you can get objects that act like instances and clones at the same time (and hopefully in the ways you want, not the ways you don't).

On the other hand, JS (ES6) is adding classes for when you want them, but object literals will still an anonymously-typed clone of an optionally-specific prototype rather than an instance of a class.

Are we meeting in the middle? I'm not sure.
1

View comments

Blog Archive
About Me
About Me
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.