Code for this post can be found at https://github.com/abarnert/lazylist.

The same discussion that brought up lazy tuple unpacking also raised the idea of implementing a full lazy list class in Python.

This is easy to do, but isn't as useful as it sounds at first glance.

Background

If you don't know about cons lists, triggers, lazy evaluation, etc., first go read about lazy cons lists.

The short version is that a lazy cons list automatically provides all of the advantages of an iterator, while still being a (linked) list. If you iterate over it (or walk it recursively) without keeping a reference to the original list, each node becomes garbage as soon as the next node is created, so there's no space cost or up-front time cost to build the list. But, on the other hand, you can use the list as a list if you want to, keeping a reference to the original value, so you can iterate over it multiple times. So, in a language like Haskell, where all lists are lazy, there's no need for separate list comprehension and generator expression syntax; there's just a comprehension that gives you a lazy list, and you can use it either way.

Lazy Python lists

In Python, having the best of an iterator and a cons list in one isn't too useful, because you rarely want a cons list, you want a Python list—that is, a dynamic array that you can, e.g., randomly access in constant time.

Can we wrap things up the same way to get lazy Python lists? Well, not quite the same way, but it's about as easy.

Let's take a little shortcut here. When discussing lazy cons lists, the goal was to make it easy to wrap a function up in something that looked like a linked list. But in Python, the obvious use for lazy lists is going to be wrapping up iterators, not functions. (And you can always turn any function into an iterator trivially if you come up with some other use.) So, I'll change the API to just take an iterable in the initializer. But see below for alternate constructors.

The trick is to keep a list of already-evaluated values, together with the iterator that provides additional values on demand. At first, you'll have an empty list; each time you need to evaluate another value, you just pull it off the iterator and append it to the list:
class LazyList(collections.abc.MutableSequence):
    def __init__(self, iterable):
        self._nonlazy, self._lazy = [], iter(iterable)
    def __len__(self):
        self._delazify()
        return len(self._nonlazy)
    def _delazify(self, index=None):
        if index is None:
            return self._nonlazy.extend(self._lazy)
        if isinstance(index, slice):
            index = range(index.start, index.stop, index.step)[-1]
        if index < 0:
            raise IndexError('LazyList cannot take negative indexes')
        self._nonlazy.extend(islice(self._lazy, index - len(self._nonlazy) + 1))
    def __getitem__(self, index):
        self._delazify(index)
        return self._nonlazy[index]
    def __delitem__(self, index):
        self._delazify(index)
        del self._nonlazy[index]
    def __setitem__(self, index, value):
        self._delazify(index)
        self._nonlazy[index] = value
    def insert(self, index, value):
        if index:
            self._delazify(index-1)
        self._nonlazy.insert(index, value)
    def __iter__(self):
        yield from self._nonlazy
        for value in self._lazy:
            self._nonlazy.append(value)
            yield value
    def append(self, value):
        self._lazy = chain(self._lazy, (value,))
    def extend(self, values):
        self._lazy = chain(self._lazy, values)
    def clear(self):
        self._nonlazy, self._lazy = [], iter(())
    def __str__(self):
        return '[{}]'.format(', '.join(map(repr, self._nonlazy + ['...'])))
    def __repr__(self):
        return 'LazyList({})'.format(self)

Bikeshedding

We have a few choices to make in a real implementation.

How long is a lazy list?

A lazy list obviously doesn't have to be infinite, but it may be, and given that we're initializing with an iterator, there's no way to know whether we're infinite or not. So, what should __len__ do?

The implementation above will eagerly evaluate the entire list—which could loop forever if the iterable is infinite. This means that it's pretty easy to write innocent code that isn't expecting an infinite sequence, pass it one, and hang. On the other hand, any other alternative, like raising an exception (in which case you have to choose carefully between TypeError, ValueError, and maybe NotImplementedError), means we're cheating as a MutableSequence, and not fully usable as a drop-in replacement for a list. On top of that, methods like __contains__ or count are going to iterate the whole list anyway, and there's no real way around that, short of making them raise as well. Or, of course, you could decide not to make it a MutableSequence at all, but just an Iterable that happens to support many but not all MutableSequence methods. That may be a useful tradeoff in some use cases, but I think a list that eagerly evaluates on __len__ is the most general design.

If you were redesigning the entire stdlib (and maybe the language) around lazy lists, you could keep track of which iterables are definitely finite, definitely infinite, or unknown, so you could eagerly evaluate finite lists but raise on infinite lists—although you'd still have to make a choice for unknown lists. (A statically-typed language could even do that at compile time, and refuse to compile code that tried to take the length of an infinite sequence.) But that's obviously too big a change to drop into Python. (You can't even assume that a normal list is finite, since it can contain itself…)

Notice that many of the methods that Sequence and MutableSequence automatically define for us call __len__: __iter__, __reversed__, append, and reverse all call it directly; __contains__, count, and index call __iter__; extend calls append; __iadd__ calls extend. Also, while clear only depends on __delitem__, it will still eagerly evaluate the list just to destroy it. It's easy to implement __iter__ manually; append and friends can be done by chaining the new values to the iterator. For count, __reversed__, and reverse, there's no good way around evaluating everything; for __contains__ and index, you'll automatically get the best behavior (only evaluate until the element is found) just by implementing __iter__. So, that's what I've done above. But be aware that the library reference doesn't actually define which methods depend on which other methods, so at least in theory, that's an implementation artifact of CPython, and it might be safer to define _all_ of these methods manually.

Negative indices

Having implemented __len__, there's no reason we couldn't also implement negative indexing and slicing. In fact, we don't even need to handle it manually (including the whole mess with slice.indices); we know we have to eagerly evaluate the whole list to use any negative index, so just do that, then delegate the index or slice to the list:
    def _delazify(self, index=None):
        if index is None:
            return self._nonlazy.extend(self._lazy)
        if isinstance(index, slice):
            if index.start < 0 or index.stop < 0:
                return self._delazify()
            index = range(index.start, index.stop, index.step)[-1]
        if index < 0:
            return self._delazify()
        self._nonlazy.extend(islice(self._lazy, index - len(self._nonlazy) + 1)) 

What is and isn't public?

There are some use cases where the user is could take advantage of knowing much of the list has been evaluated so far. We could do that by having a nonlazy_length method (or even making len return that, instead of evaluating the whole list) and letting the caller index accordingly, but it seems even simpler for the caller (and obviously for the class) to just turn the private _nonlazy attribute into a public one. Then, if you want to iterate over all so-far-computed values, it's just for i in lst.nonlazy, not for i in lst[:lst.nonlazy_length()]. But "nonlazy" isn't a very good name for the value from an outside consumer point of view; maybe "computed" is better. Anyway, I can't see any good argument for keeping this hidden, and at least in some use cases there's a reason to not do so, so it would probably be a better design.

We could also expose the _delazify method, but that doesn't seem as necessary. Wanting to explicitly force evaluation (either up to a certain point, or of the whole thing) without actually wanting to see those values doesn't seem like it would be very common—and if you really want to, lst[30] or lst[-1] works just fine.

Representation

The str and repr give us another place to make a choice. In most cases, the repr is either a string that can be evaluated to produce the same value, or something that's obviously not evaluatable, like the generic <LastLazy at 0x123456789>. But for list, repr includes [...] if the list recursively includes itself, and of course eval('[1, 2, 3, [...]]') evaluates to [1, 2, 3, [Ellipsis]], which is not the same thing you started with. I think that's enough precedent to use ... for lazy lists as well, as I've done above.

We could keep track of the fact that we've exhausted our iterator (with a flag, or just replacing self._lazy with None), which would allow us to drop the ... when the list is no longer lazy, but that seems more work than it's worth for the small benefit.

Optimizing Sequences

If the input iterable is a Sequence, we could just copy the sequence to _nonlazy, instead of putting it in _lazy. For some use cases, that might have a significant performance benefit, but in others it won't make any difference, and it makes it a little harder to play with lazy lists, so I haven't done it. If you want to, it's trivial:
    def __init__(self, iterable):
        if isinstance(iterable, collections.abc.Sequence):
            self._nonlazy, self._lazy = list(iterable), iter(())
        else:
            self._nonlazy, self._lazy = [], iter(iterable)

Slicing

We could make slicing return a new LazyList instead of a list. But that's not as easy as it sounds.

The simplest solution is to just return LazyList(islice(self, slice.start, slice.stop, slice.step)), which works great—except that it won't handle negative slice indices. That may be an acceptable tradeoff, but it may not. Of course we could fully evaluate, then return LazyList(self), to handle negative slice indices. (In fact, together with the sequence optimization above, this would be effectively equivalent to what we do now for negative slices, but lazy for positive slices.)

Alternatively, we could implement a view class (as dict does for its keys, values, and items, and NumPy does for its arrays, and as I touched on in my posts on map and filter views and lazy tuple unpacking. But that's making a pretty major change to the sequence API, especially once we start dealing with mutating operations. (And it brings up new questions, e.g., should mutating a view mutate the parent, or should it be copy-on-write?)

Finally, as an alternative to a generic copy-on-write view, we could explicitly slice _nonlazy, share _lazy, keep track of the slicing information, and keep track of everyone who needs to be updated whenever _lazy is iterated (similar to what itertools.tee does). This sounds like a lot of extra work for no benefit, but it's a foundation on which we could add further benefits. For example, if we kept weak references to all owners of the shared _lazy, we could stop updating them when they go away.

All of these are potentially good ideas for specific use cases, but probably not appropriate for a general-purpose sequence class.

Benefits of LazyList

So, now we have a complete MutableSequence, which can be used as a drop-in replacement for list, but which doesn't evaluate values until you need it. So, does that give us all the advantages of Haskell's lazy lists?

Unfortunately, it doesn't. While we get the time advantages of an iterator (values aren't generated until you need them), we don't get the space advantages (only one value live at a time).

So, what's the difference? Well, as mentioned above, when you iterate over a cons list (or use it recursively) without keeping a reference to the list, each time you evaluate the next node, the previous one automatically becomes garbage and can be cleaned up. But a lazy Python list can't do that. Even if you drop your reference to the original LazyList object, the iterator still has a reference to it. And you can't magic that away; it needs a reference to at least self._nonlazy, or the list can't be reused after iteration. And of course it needs to keep appending to that self._nonlazy, too. So, by the time you're done iterating, you have the whole list alive in memory.

The last alternative discussed above—a manual copy-on-write slice view with weak references—would allow you to use the LazyList cheaply, but only in recursive slicing algorithms (whether using a[0] and recursing on a[1:], or something else like recursing on a[::2] and a[1::2]). I think in most such use cases, you'd be better off just using an iterator instead of trying to write your code around sequences that may or may not be lazy.

In short, if you need an iterable to be a sequence, you generally don't need it to be lazy; if you need it to be lazy, you generally don't need it to be a sequence; if you need it to be both, there's no convenient way to write your code.

So, lazy lists aren't useless in Python, but they're not as exciting as they first appear.

Let's look at the specific use case that brought up this suggestion: lazy tuple unpacking. The idea was that a, b, *c, d, e = foo could be implemented something like this:
lfoo = LazyList(foo)
a, b, c, d, e = lfoo[0], lfoo[1], lfoo[2:-2], lfoo[-2], lfoo[-1]
But clearly, that's going to evaluate the entire list anyway. If foo is some object that already knows how to slice itself lazily (like range, or np.ndarray), the LazyList is getting in the way here; we would have been better off just using foo[2:-2]. If it's not, the LazyList can't help us, no matter how we implement it, because there's no way it can pop off the last two values without iterating to the end. An iterator, on the other hand, could help us, at least in the case that foo is a Sequence:
a, b, c, d, e = foo[0], foo[1], islice(foo, 2, len(foo)-2), foo[-2], foo[-1]
That will make c a lazy iterator over the slice, rather than a copy of the whole thing. Of course it still doesn't help if foo is an iterator, but there's nothing that can possibly help there anyway.

And I think most of the cases where lazy lists at first seem attractive will turn out the same way: at best the same as iterators (possibly with itertools.tee), and often even less useful.

But of course I'd love to see examples that prove me wrong.

Construction

We want to make it easy to write LazyList-generating functions. That's easy; just write a decorator that you can put on any generator function, and now it returns a LazyList instead of an iterator:
def lazylist(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        return LazyList(func(*args, **kwargs))
    return wrapper
But you can go farther with this. What about a decorator that passes the LazyList itself to the function? Then you can write this:
@recursive_lazylist
def primes(lst):
    yield 2
    for candidate in itertools.count(3): #start at next number after 2
        if all(candidate % p for p in lst.computed()):
            yield candidate
That's not quite as easy to write; we can't pass the iterator returned by calling the generator function into the LazyList constructor and also pass the constructed LazyList into the generator function as an argument. But there are a couple different ways around this. We could just construct a LazyList without an iterator, call the function, then assign the iterator after the fact:
def recursive_lazylist(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        lst = LazyList(())
        lst._lazy = func(lst, *args, **kwargs)
        return lst
    return wrapper
A really clever idea (not mine) is to realize that that first argument is exactly equivalent to the self argument for a method call. So, we can create a new subclass on the fly, like this:
class RecursiveLazyList(LazyList):
    @abc.abstractmethod
    def _producer(self):
        while False:
            yield
    def __init__(self, *args, **kwds):
        super().__init__(self._producer(*args, **kwds))

def recursive_lazylist(func):
    return type(RecursiveLazyList)(func.__name__,
                                   (RecursiveLazyList,),
                                   {_producer: func})
This has the advantage that a reader can use their existing knowledge and intuitions about classes to realize that at the time the function is getting called, the new subclass's __init__ has been called but the subclass __init__ hasn't. Although the need to understand the basics of metaclasses (and dynamic class construction) may reduce the number of people who have useful intuitions here…

By the way, notice that I didn't use an explicit metaclass for RecursiveLazyList. As far as I can tell from the documentation, anything inheriting collections.abc.Sequence is guaranteed to already have some subclass of ABCMeta as its metaclass, so explicitly specifying ABCMeta here would just cause problems if some future version of collections.abc used a proper subclass rather than ABCMeta itself (as CPython 3.4 does), and can't be necessary in any possible implementation. But I may be wrong here; it certainly looks less clear to use abstractmethod when there are no visible metaclasses.

Sidebar: going crazy with LazyList

It may be that to really see the benefits of LazyList, you'd need to cram more support into the language. I don't think that's worth doing—I think if there are benefits to LazyList, you can find them without doing any more than building the class. But in case I'm wrong, you can go a lot farther without too much hacking.

There are many functions in the stdlib that return iterators. Just as many of these functions could instead return a view (as discussed in Swift-style map and filter views), they could also return a lazy list. We can easily create wrappers for them:
def _wrap(module, name):
    globals()[name] = lazylist(getattr(module, name))
for name in 'map', 'filter', 'zip', 'enumerate':
    _wrap(builtins, name)
for name in ('count', 'cycle', 'repeat', 'accumulate', 'chain',
             'compress', 'dropwhile', 'filterfalse', 'groupby',
             'islice', 'starmap', 'takewhile', 'zip_longest',
             'product', 'permutations', 'combinations',
             'combinations_with_replacement'):
    _wrap(itertools, name)
del _wrap
Or, if you really want to, you can monkeypatch the stdlib itself (just replace that globals()[name] = func with setattr(module, name, func)). This is starting to get more dangerous—there could easily be code in the stdlib or a third-party library that expects map to return an iterator, not a LazyList—but if you want to try it, go for it.

While we're going crazy, what if you wanted to automatically wrap every generator function so it became a lazylist function? That should be doable with an import hook. See inline (bytecode) assembly for the basics—or, better, you could probably do this with MacroPy. You could try to determine which FunctionDef nodes are generator functions by looking for any Yield or YieldFrom descendants, and, if found, just add a node to the decorator_list. Or, maybe even hackier, write a safe decorator and just decorate every function. For example:
def lazylist(func):
    if not inspect.isgeneratorfunction(func):
        return func
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        return LazyList(func(*args, **kwargs))
    return wrapper

builtins.lazylist = lazylist

class LazifyGenerators(ast.NodeTransformer):
    def visit_FunctionDef(self, node):
        node.decorator_last.insert(0,
            ast.copy_location(ast.Name(id='lazylist'), node))
        return node
I've left out the code to install a SourceFileLoader import hook and cram the LazifyGenerators transformer in at the right point, but emptyify has some working (but probably not very good) code for Python 3.4+.

And, while you're hacking up the AST, you could even change list displays and comprehensions (or, if you prefer, generator expressions) to instead build LazyLists):
class LazifyLists(ast.NodeTransformer):
    def visit_List(self, node):
        node = self.generic_visit(node)
        func = ast.copy_location(
            ast.Name(id='lazylist', ctx=ast.Load()),
            node)
        call = ast.copy_location(
            ast.Call(func=func, ctx=func.ctx,
                     args=[node], keywords=[],
                     starargs=None, kwargs=None),
            func)
        return call
    visit_ListComp = visit_List
All this without needing to hack up the compiler or interpreter. Of course you can do that too. For CPython, first you need to port LazyList itself to C, and make it available as a builtin, and export C-API-style methods like PyLazyList_New. Then, if you want to, e.g., change list comprehensions to always return LazyLists instead, just change the interpreter code for BUILD_LIST to call PyLazyList_New and PyLazyList_SetItem. If you'd rather add new syntax, that's less radical, but more work—starting with picking the syntax (would angle brackets work? if not, what symbols are left?), then editing the grammar, then changing the compiler to be able to generate a new BUILD_LAZYLIST opcode, and then finally modifying the interpreter the same way as above, except adding a BUILD_LAZYLIST handler instead of modifying the BUILD_LIST handler.
After all that, if you still can't find any benefits for LazyList, then it's probably time to admit that Python doesn't need it. :)
2

View comments

It's been more than a decade since Typical Programmer Greg Jorgensen taught the word about Abject-Oriented Programming.

Much of what he said still applies, but other things have changed.

5

I haven't posted anything new in a couple years (partly because I attempted to move to a different blogging platform where I could write everything in markdown instead of HTML but got frustrated—which I may attempt again), but I've had a few private comments and emails on some of the old posts, so I

6

Looking before you leap

Python is a duck-typed language, and one where you usually trust EAFP ("Easier to Ask Forgiveness than Permission") over LBYL ("Look Before You Leap").

1

Background

Currently, CPython’s internal bytecode format stores instructions with no args as 1 byte, instructions with small args as 3 bytes, and instructions with large args as 6 bytes (actually, a 3-byte EXTENDED_ARG followed by a 3-byte real instruction).

6

If you want to skip all the tl;dr and cut to the chase, jump to Concrete Proposal.

8

Many people, when they first discover the heapq module, have two questions:

Why does it define a bunch of functions instead of a container type? Why don't those functions take a key or reverse parameter, like all the other sorting-related stuff in Python? Why not a type?

At the abstract level, it'

1

Currently, in CPython, if you want to process bytecode, either in C or in Python, it’s pretty complicated.

The built-in peephole optimizer has to do extra work fixing up jump targets and the line-number table, and just punts on many cases because they’re too hard to deal with.

3

One common "advanced question" on places like StackOverflow and python-list is "how do I dynamically create a function/method/class/whatever"? The standard answer is: first, some caveats about why you probably don't want to do that, and then an explanation of the various ways to do it when you reall

1

A few years ago, Cesare di Mauro created a project called WPython, a fork of CPython 2.6.4 that “brings many optimizations and refactorings”. The starting point of the project was replacing the bytecode with “wordcode”. However, there were a number of other changes on top of it.

1

Many languages have a for-each loop.

4

When the first betas for Swift came out, I was impressed by their collection design. In particular, the way it allows them to write map-style functions that are lazy (like Python 3), but still as full-featured as possible.

2

In a previous post, I explained in detail how lookup works in Python.

2

The documentation does a great job explaining how things normally get looked up, and how you can hook them.

But to understand how the hooking works, you need to go under the covers to see how that normal lookup actually happens.

When I say "Python" below, I'm mostly talking about CPython 3.5.

7

In Python (I'm mostly talking about CPython here, but other implementations do similar things), when you write the following:

def spam(x): return x+1 spam(3) What happens?

Really, it's not that complicated, but there's no documentation anywhere that puts it all together.

2

I've seen a number of people ask why, if you can have arbitrary-sized integers that do everything exactly, you can't do the same thing with floats, avoiding all the rounding problems that they keep running into.

2

In a recent thread on python-ideas, Stephan Sahm suggested, in effect, changing the method resolution order (MRO) from C3-linearization to a simple depth-first search a la old-school Python or C++.

1

Note: This post doesn't talk about Python that much, except as a point of comparison for JavaScript.

Most object-oriented languages out there, including Python, are class-based. But JavaScript is instead prototype-based.

1

About a year and a half ago, I wrote a blog post on the idea of adding pattern matching to Python.

I finally got around to playing with Scala semi-seriously, and I realized that they pretty much solved the same problem, in a pretty similar way to my straw man proposal, and it works great.

About a year ago, Jules Jacobs wrote a series (part 1 and part 2, with part 3 still forthcoming) on the best collections library design.

1

In three separate discussions on the Python mailing lists this month, people have objected to some design because it leaks something into the enclosing scope. But "leaks into the enclosing scope" isn't a real problem.

2

There's a lot of confusion about what the various kinds of things you can iterate over in Python. I'll attempt to collect definitions for all of the relevant terms, and provide examples, here, so I don't have to go over the same discussions in the same circles every time.

8

Python has a whole hierarchy of collection-related abstract types, described in the collections.abc module in the standard library. But there are two key, prototypical kinds. Iterators are one-shot, used for a single forward traversal, and usually lazy, generating each value on the fly as requested.

2

There are a lot of novice questions on optimizing NumPy code on StackOverflow, that make a lot of the same mistakes. I'll try to cover them all here.

What does NumPy speed up?

Let's look at some Python code that does some computation element-wise on two lists of lists.

2

When asyncio was first proposed, many people (not so much on python-ideas, where Guido first suggested it, but on external blogs) had the same reaction: Doing the core reactor loop in Python is going to be way too slow. Something based on libev, like gevent, is inherently going to be much faster.

Let's say you have a good idea for a change to Python.

1

There are hundreds of questions on StackOverflow that all ask variations of the same thing. Paraphrasing:

lst is a list of strings and numbers. I want to convert the numbers to int but leave the strings alone.

2

In Haskell, you can section infix operators. This is a simple form of partial evaluation. Using Python syntax, the following are equivalent:

(2*) lambda x: 2*x (*2) lambda x: x*2 (*) lambda x, y: x*y So, can we do the same in Python?

Grammar

The first form, (2*), is unambiguous.

1

Many people—especially people coming from Java—think that using try/except is "inelegant", or "inefficient". Or, slightly less meaninglessly, they think that "exceptions should only be for errors, not for normal flow control".

These people are not going to be happy with Python.

2

If you look at Python tutorials and sample code, proposals for new language features, blogs like this one, talks at PyCon, etc., you'll see spam, eggs, gouda, etc. all over the place.

Most control structures in most most programming languages, including Python, are subordinating conjunctions, like "if", "while", and "except", although "with" is a preposition, and "for" is a preposition used strangely (although not as strangely as in C…).

There are two ways that some Python programmers overuse lambda. Doing this almost always mkes your code less readable, and for no corresponding benefit.

1

Some languages have a very strong idiomatic style—in Python, Haskell, or Swift, the same code by two different programmers is likely to look a lot more similar than in Perl, Lisp, or C++.

There's an advantage to this—and, in particular, an advantage to you sticking to those idioms.

1

Python doesn't have a way to clone generators.

At least for a lot of simple cases, however, it's pretty obvious what cloning them should do, and being able to do so would be handy. But for a lot of other cases, it's not at all obvious.

5

Every time someone has a good idea, they believe it should be in the stdlib. After all, it's useful to many people, and what's the harm? But of course there is a harm.

3

This confuses every Python developer the first time they see it—even if they're pretty experienced by the time they see it:

>>> t = ([], []) >>> t[0] += [1] --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <stdin> in <module>()

11
Blog Archive
About Me
About Me
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.