The same discussion that brought up lazy tuple unpacking also raised the idea of implementing a full lazy list class in Python.
This is easy to do, but isn't as useful as it sounds at first glance.
Background
If you don't know about cons lists, triggers, lazy evaluation, etc., first go read about lazy cons lists.The short version is that a lazy cons list automatically provides all of the advantages of an iterator, while still being a (linked) list. If you iterate over it (or walk it recursively) without keeping a reference to the original list, each node becomes garbage as soon as the next node is created, so there's no space cost or up-front time cost to build the list. But, on the other hand, you can use the list as a list if you want to, keeping a reference to the original value, so you can iterate over it multiple times. So, in a language like Haskell, where all lists are lazy, there's no need for separate list comprehension and generator expression syntax; there's just a comprehension that gives you a lazy list, and you can use it either way.
Lazy Python lists
In Python, having the best of an iterator and a cons list in one isn't too useful, because you rarely want a cons list, you want a Python list—that is, a dynamic array that you can, e.g., randomly access in constant time.Can we wrap things up the same way to get lazy Python lists? Well, not quite the same way, but it's about as easy.
Let's take a little shortcut here. When discussing lazy cons lists, the goal was to make it easy to wrap a function up in something that looked like a linked list. But in Python, the obvious use for lazy lists is going to be wrapping up iterators, not functions. (And you can always turn any function into an iterator trivially if you come up with some other use.) So, I'll change the API to just take an iterable in the initializer. But see below for alternate constructors.
The trick is to keep a list of already-evaluated values, together with the iterator that provides additional values on demand. At first, you'll have an empty list; each time you need to evaluate another value, you just pull it off the iterator and append it to the list:
class LazyList(collections.abc.MutableSequence): def __init__(self, iterable): self._nonlazy, self._lazy = [], iter(iterable) def __len__(self): self._delazify() return len(self._nonlazy) def _delazify(self, index=None): if index is None: return self._nonlazy.extend(self._lazy) if isinstance(index, slice): index = range(index.start, index.stop, index.step)[-1] if index < 0: raise IndexError('LazyList cannot take negative indexes') self._nonlazy.extend(islice(self._lazy, index - len(self._nonlazy) + 1)) def __getitem__(self, index): self._delazify(index) return self._nonlazy[index] def __delitem__(self, index): self._delazify(index) del self._nonlazy[index] def __setitem__(self, index, value): self._delazify(index) self._nonlazy[index] = value def insert(self, index, value): if index: self._delazify(index-1) self._nonlazy.insert(index, value) def __iter__(self): yield from self._nonlazy for value in self._lazy: self._nonlazy.append(value) yield value def append(self, value): self._lazy = chain(self._lazy, (value,)) def extend(self, values): self._lazy = chain(self._lazy, values) def clear(self): self._nonlazy, self._lazy = [], iter(()) def __str__(self): return '[{}]'.format(', '.join(map(repr, self._nonlazy + ['...']))) def __repr__(self): return 'LazyList({})'.format(self)
Bikeshedding
We have a few choices to make in a real implementation.How long is a lazy list?
A lazy list obviously doesn't have to be infinite, but it may be, and given that we're initializing with an iterator, there's no way to know whether we're infinite or not. So, what should __len__ do?The implementation above will eagerly evaluate the entire list—which could loop forever if the iterable is infinite. This means that it's pretty easy to write innocent code that isn't expecting an infinite sequence, pass it one, and hang. On the other hand, any other alternative, like raising an exception (in which case you have to choose carefully between TypeError, ValueError, and maybe NotImplementedError), means we're cheating as a MutableSequence, and not fully usable as a drop-in replacement for a list. On top of that, methods like __contains__ or count are going to iterate the whole list anyway, and there's no real way around that, short of making them raise as well. Or, of course, you could decide not to make it a MutableSequence at all, but just an Iterable that happens to support many but not all MutableSequence methods. That may be a useful tradeoff in some use cases, but I think a list that eagerly evaluates on __len__ is the most general design.
If you were redesigning the entire stdlib (and maybe the language) around lazy lists, you could keep track of which iterables are definitely finite, definitely infinite, or unknown, so you could eagerly evaluate finite lists but raise on infinite lists—although you'd still have to make a choice for unknown lists. (A statically-typed language could even do that at compile time, and refuse to compile code that tried to take the length of an infinite sequence.) But that's obviously too big a change to drop into Python. (You can't even assume that a normal list is finite, since it can contain itself…)
Notice that many of the methods that Sequence and MutableSequence automatically define for us call __len__: __iter__, __reversed__, append, and reverse all call it directly; __contains__, count, and index call __iter__; extend calls append; __iadd__ calls extend. Also, while clear only depends on __delitem__, it will still eagerly evaluate the list just to destroy it. It's easy to implement __iter__ manually; append and friends can be done by chaining the new values to the iterator. For count, __reversed__, and reverse, there's no good way around evaluating everything; for __contains__ and index, you'll automatically get the best behavior (only evaluate until the element is found) just by implementing __iter__. So, that's what I've done above. But be aware that the library reference doesn't actually define which methods depend on which other methods, so at least in theory, that's an implementation artifact of CPython, and it might be safer to define _all_ of these methods manually.
Negative indices
Having implemented __len__, there's no reason we couldn't also implement negative indexing and slicing. In fact, we don't even need to handle it manually (including the whole mess with slice.indices); we know we have to eagerly evaluate the whole list to use any negative index, so just do that, then delegate the index or slice to the list:def _delazify(self, index=None): if index is None: return self._nonlazy.extend(self._lazy) if isinstance(index, slice): if index.start < 0 or index.stop < 0: return self._delazify() index = range(index.start, index.stop, index.step)[-1] if index < 0: return self._delazify() self._nonlazy.extend(islice(self._lazy, index - len(self._nonlazy) + 1))
What is and isn't public?
There are some use cases where the user is could take advantage of knowing much of the list has been evaluated so far. We could do that by having a nonlazy_length method (or even making len return that, instead of evaluating the whole list) and letting the caller index accordingly, but it seems even simpler for the caller (and obviously for the class) to just turn the private _nonlazy attribute into a public one. Then, if you want to iterate over all so-far-computed values, it's just for i in lst.nonlazy, not for i in lst[:lst.nonlazy_length()]. But "nonlazy" isn't a very good name for the value from an outside consumer point of view; maybe "computed" is better. Anyway, I can't see any good argument for keeping this hidden, and at least in some use cases there's a reason to not do so, so it would probably be a better design.We could also expose the _delazify method, but that doesn't seem as necessary. Wanting to explicitly force evaluation (either up to a certain point, or of the whole thing) without actually wanting to see those values doesn't seem like it would be very common—and if you really want to, lst[30] or lst[-1] works just fine.
Representation
The str and repr give us another place to make a choice. In most cases, the repr is either a string that can be evaluated to produce the same value, or something that's obviously not evaluatable, like the generic <LastLazy at 0x123456789>. But for list, repr includes [...] if the list recursively includes itself, and of course eval('[1, 2, 3, [...]]') evaluates to [1, 2, 3, [Ellipsis]], which is not the same thing you started with. I think that's enough precedent to use ... for lazy lists as well, as I've done above.We could keep track of the fact that we've exhausted our iterator (with a flag, or just replacing self._lazy with None), which would allow us to drop the ... when the list is no longer lazy, but that seems more work than it's worth for the small benefit.
Optimizing Sequences
If the input iterable is a Sequence, we could just copy the sequence to _nonlazy, instead of putting it in _lazy. For some use cases, that might have a significant performance benefit, but in others it won't make any difference, and it makes it a little harder to play with lazy lists, so I haven't done it. If you want to, it's trivial:def __init__(self, iterable): if isinstance(iterable, collections.abc.Sequence): self._nonlazy, self._lazy = list(iterable), iter(()) else: self._nonlazy, self._lazy = [], iter(iterable)
Slicing
We could make slicing return a new LazyList instead of a list. But that's not as easy as it sounds.The simplest solution is to just return LazyList(islice(self, slice.start, slice.stop, slice.step)), which works great—except that it won't handle negative slice indices. That may be an acceptable tradeoff, but it may not. Of course we could fully evaluate, then return LazyList(self), to handle negative slice indices. (In fact, together with the sequence optimization above, this would be effectively equivalent to what we do now for negative slices, but lazy for positive slices.)
Alternatively, we could implement a view class (as dict does for its keys, values, and items, and NumPy does for its arrays, and as I touched on in my posts on map and filter views and lazy tuple unpacking. But that's making a pretty major change to the sequence API, especially once we start dealing with mutating operations. (And it brings up new questions, e.g., should mutating a view mutate the parent, or should it be copy-on-write?)
Finally, as an alternative to a generic copy-on-write view, we could explicitly slice _nonlazy, share _lazy, keep track of the slicing information, and keep track of everyone who needs to be updated whenever _lazy is iterated (similar to what itertools.tee does). This sounds like a lot of extra work for no benefit, but it's a foundation on which we could add further benefits. For example, if we kept weak references to all owners of the shared _lazy, we could stop updating them when they go away.
All of these are potentially good ideas for specific use cases, but probably not appropriate for a general-purpose sequence class.
Benefits of LazyList
So, now we have a complete MutableSequence, which can be used as a drop-in replacement for list, but which doesn't evaluate values until you need it. So, does that give us all the advantages of Haskell's lazy lists?Unfortunately, it doesn't. While we get the time advantages of an iterator (values aren't generated until you need them), we don't get the space advantages (only one value live at a time).
So, what's the difference? Well, as mentioned above, when you iterate over a cons list (or use it recursively) without keeping a reference to the list, each time you evaluate the next node, the previous one automatically becomes garbage and can be cleaned up. But a lazy Python list can't do that. Even if you drop your reference to the original LazyList object, the iterator still has a reference to it. And you can't magic that away; it needs a reference to at least self._nonlazy, or the list can't be reused after iteration. And of course it needs to keep appending to that self._nonlazy, too. So, by the time you're done iterating, you have the whole list alive in memory.
The last alternative discussed above—a manual copy-on-write slice view with weak references—would allow you to use the LazyList cheaply, but only in recursive slicing algorithms (whether using a[0] and recursing on a[1:], or something else like recursing on a[::2] and a[1::2]). I think in most such use cases, you'd be better off just using an iterator instead of trying to write your code around sequences that may or may not be lazy.
In short, if you need an iterable to be a sequence, you generally don't need it to be lazy; if you need it to be lazy, you generally don't need it to be a sequence; if you need it to be both, there's no convenient way to write your code.
So, lazy lists aren't useless in Python, but they're not as exciting as they first appear.
Let's look at the specific use case that brought up this suggestion: lazy tuple unpacking. The idea was that a, b, *c, d, e = foo could be implemented something like this:
lfoo = LazyList(foo) a, b, c, d, e = lfoo[0], lfoo[1], lfoo[2:-2], lfoo[-2], lfoo[-1]But clearly, that's going to evaluate the entire list anyway. If foo is some object that already knows how to slice itself lazily (like range, or np.ndarray), the LazyList is getting in the way here; we would have been better off just using foo[2:-2]. If it's not, the LazyList can't help us, no matter how we implement it, because there's no way it can pop off the last two values without iterating to the end. An iterator, on the other hand, could help us, at least in the case that foo is a Sequence:
a, b, c, d, e = foo[0], foo[1], islice(foo, 2, len(foo)-2), foo[-2], foo[-1]That will make c a lazy iterator over the slice, rather than a copy of the whole thing. Of course it still doesn't help if foo is an iterator, but there's nothing that can possibly help there anyway.
And I think most of the cases where lazy lists at first seem attractive will turn out the same way: at best the same as iterators (possibly with itertools.tee), and often even less useful.
But of course I'd love to see examples that prove me wrong.
Construction
We want to make it easy to write LazyList-generating functions. That's easy; just write a decorator that you can put on any generator function, and now it returns a LazyList instead of an iterator:def lazylist(func): @functools.wraps(func) def wrapper(*args, **kwargs): return LazyList(func(*args, **kwargs)) return wrapperBut you can go farther with this. What about a decorator that passes the LazyList itself to the function? Then you can write this:
@recursive_lazylist def primes(lst): yield 2 for candidate in itertools.count(3): #start at next number after 2 if all(candidate % p for p in lst.computed()): yield candidateThat's not quite as easy to write; we can't pass the iterator returned by calling the generator function into the LazyList constructor and also pass the constructed LazyList into the generator function as an argument. But there are a couple different ways around this. We could just construct a LazyList without an iterator, call the function, then assign the iterator after the fact:
def recursive_lazylist(func): @functools.wraps(func) def wrapper(*args, **kwargs): lst = LazyList(()) lst._lazy = func(lst, *args, **kwargs) return lst return wrapperA really clever idea (not mine) is to realize that that first argument is exactly equivalent to the self argument for a method call. So, we can create a new subclass on the fly, like this:
class RecursiveLazyList(LazyList): @abc.abstractmethod def _producer(self): while False: yield def __init__(self, *args, **kwds): super().__init__(self._producer(*args, **kwds)) def recursive_lazylist(func): return type(RecursiveLazyList)(func.__name__, (RecursiveLazyList,), {_producer: func})This has the advantage that a reader can use their existing knowledge and intuitions about classes to realize that at the time the function is getting called, the new subclass's __init__ has been called but the subclass __init__ hasn't. Although the need to understand the basics of metaclasses (and dynamic class construction) may reduce the number of people who have useful intuitions here…
By the way, notice that I didn't use an explicit metaclass for RecursiveLazyList. As far as I can tell from the documentation, anything inheriting collections.abc.Sequence is guaranteed to already have some subclass of ABCMeta as its metaclass, so explicitly specifying ABCMeta here would just cause problems if some future version of collections.abc used a proper subclass rather than ABCMeta itself (as CPython 3.4 does), and can't be necessary in any possible implementation. But I may be wrong here; it certainly looks less clear to use abstractmethod when there are no visible metaclasses.
Sidebar: going crazy with LazyList
It may be that to really see the benefits of LazyList, you'd need to cram more support into the language. I don't think that's worth doing—I think if there are benefits to LazyList, you can find them without doing any more than building the class. But in case I'm wrong, you can go a lot farther without too much hacking.There are many functions in the stdlib that return iterators. Just as many of these functions could instead return a view (as discussed in Swift-style map and filter views), they could also return a lazy list. We can easily create wrappers for them:
def _wrap(module, name): globals()[name] = lazylist(getattr(module, name)) for name in 'map', 'filter', 'zip', 'enumerate': _wrap(builtins, name) for name in ('count', 'cycle', 'repeat', 'accumulate', 'chain', 'compress', 'dropwhile', 'filterfalse', 'groupby', 'islice', 'starmap', 'takewhile', 'zip_longest', 'product', 'permutations', 'combinations', 'combinations_with_replacement'): _wrap(itertools, name) del _wrapOr, if you really want to, you can monkeypatch the stdlib itself (just replace that globals()[name] = func with setattr(module, name, func)). This is starting to get more dangerous—there could easily be code in the stdlib or a third-party library that expects map to return an iterator, not a LazyList—but if you want to try it, go for it.
While we're going crazy, what if you wanted to automatically wrap every generator function so it became a lazylist function? That should be doable with an import hook. See inline (bytecode) assembly for the basics—or, better, you could probably do this with MacroPy. You could try to determine which FunctionDef nodes are generator functions by looking for any Yield or YieldFrom descendants, and, if found, just add a node to the decorator_list. Or, maybe even hackier, write a safe decorator and just decorate every function. For example:
def lazylist(func): if not inspect.isgeneratorfunction(func): return func @functools.wraps(func) def wrapper(*args, **kwargs): return LazyList(func(*args, **kwargs)) return wrapper builtins.lazylist = lazylist class LazifyGenerators(ast.NodeTransformer): def visit_FunctionDef(self, node): node.decorator_last.insert(0, ast.copy_location(ast.Name(id='lazylist'), node)) return nodeI've left out the code to install a SourceFileLoader import hook and cram the LazifyGenerators transformer in at the right point, but emptyify has some working (but probably not very good) code for Python 3.4+.
And, while you're hacking up the AST, you could even change list displays and comprehensions (or, if you prefer, generator expressions) to instead build LazyLists):
class LazifyLists(ast.NodeTransformer): def visit_List(self, node): node = self.generic_visit(node) func = ast.copy_location( ast.Name(id='lazylist', ctx=ast.Load()), node) call = ast.copy_location( ast.Call(func=func, ctx=func.ctx, args=[node], keywords=[], starargs=None, kwargs=None), func) return call visit_ListComp = visit_ListAll this without needing to hack up the compiler or interpreter. Of course you can do that too. For CPython, first you need to port LazyList itself to C, and make it available as a builtin, and export C-API-style methods like PyLazyList_New. Then, if you want to, e.g., change list comprehensions to always return LazyLists instead, just change the interpreter code for BUILD_LIST to call PyLazyList_New and PyLazyList_SetItem. If you'd rather add new syntax, that's less radical, but more work—starting with picking the syntax (would angle brackets work? if not, what symbols are left?), then editing the grammar, then changing the compiler to be able to generate a new BUILD_LAZYLIST opcode, and then finally modifying the interpreter the same way as above, except adding a BUILD_LAZYLIST handler instead of modifying the BUILD_LIST handler.
After all that, if you still can't find any benefits for LazyList, then it's probably time to admit that Python doesn't need it. :)
View comments