This post is meant as background to the following post on lazy Python lists. If you already know all about cons lists, triggers, lazy evaluation, tail sharing, etc., you don't need to read it.

Linked lists

What Python calls "list" is actually a dynamic array. In most other languages, especially functional languages, "list" refers to some sort of linked list. Here's a basic linked list in Python:
class Hlist(collections.abc.Iterable):
    class Node:
        def __init__(self, value, next):
            self.value, self.next = value, next
        def __repr__(self):
            return '{}({}, {})'.format(type(self).__name__,
                                       self.value, self.next)
    def __init__(self, iterable):
        self.head = last = None
        for value in iterable:
            node = Hlist.Node(value, None)
            if last is None:
                self.head = node
            else:
                last.next = node
            last = node
    def __iter__(self):
        node = self.head
        while node is not None:
            yield node.value
            node = node.next
    def __repr__(self):
        return '{}({})'.format(type(self).__name__, list(self))
Of course you can implement all of the methods of the MutableSequence ABC on top of this—although many of them will take linear time, instead of the constant time of arrays (like Python's list). For example:
    def __len__(self, value):
        length = sum(1 for _ in self)
There are all kinds of variations on linked lists that I won't get into here—double-linked lists, lists that hold a tail pointer as well as a head, lists with sentinel nodes, etc. If you want to know more, the Wikipedia article linked above should cover everything.

Removing the list handle

If you look at the Hlist class, really all it's holding is that head attribute, which is the first node. What if you just passed around the nodes themselves?

That means, among other things, that if you already have the node at which you want to perform an operation, it takes constant time to do so, because you no longer have to iterate from the start of the list.

This also means that an empty list is no longer the same type as a non-empty list—it's None, rather than a Node. But that's not a problem in a duck-typed language like Python:
class Slist(collections.abc.Iterable):
    def __init__(self, head, tail):
        self.head, self.tail = head, tail
    @classmethod
    def fromiter(cls, iterable):
        first, last = None, None
        for value in iterable:
            node = Slist(value, None)
            if last is not None:
                last.tail = node
            if first is None:
                first = node
        return first
    def __iter__(self):
        node = self
        while node is not None:
            yield node.head
            node = node.tail
    def __repr__(self):
        return '{}({})'.format(type(self).__name__, list(self))
(This also wouldn't be a problem in a statically-typed language with a sufficiently powerful type system either; you'd use a union type or some other higher-level type equivalent to Node|NoneType, or Maybe Node, etc.)

However, this design does run into problems with some of the methods of the MutableSequence ABC. Most notably, insert and __delitem__ can't operate on position 0, because you can't change the head of the list when the only reference you have is to the head of the list.

You can instead write immutable methods that return a new Slist:
    def inserted(self, index, value):
        i = iter(self)
        return Slist(chain(islice(i, 0, index)), [value], i))
… but that obviously has performance implications that may not always be acceptable. A language designed around immutable linked lists (like ML or Haskell) will have ways to optimize away most of the copying costs whenever possible, but that's not going to happen automatically in Python.

Side note: C++ std::list

I won't follow up on this in depth, but C++ (along with a few other languages, like Swift) has a clever way of getting the best of both worlds. The std::list type is itself a list handle, with head and tail pointers, which means you can destructively insert or delete at the head. But it also provides a special kind of opaque pointer (called an iterator, but of course C++ iterators aren't the same kind of thing as Python iterators) to positions within the list, which means that after you (linearly) find some position, you can then perform a bunch of (constant-time, destructive) operations at that position. On top of that, the C++ stdlib is mostly made up of generic functions that work on any such pair of head and tail pointers—whether they come from a linked list, an array, an input stream iterator, or some abstract function that generates values on the fly, as long as they know how to dereference (get the value pointed at), increment (point at the next value), and compare for equality. (In this sense, C++ iterators are related to Python iterators, of course—just as map takes any iterable, no matter what its type or where it came from, std::transform takes any pair of iterators, no matter what their type or where they came from.)

Recursion

Before we go any further, notice that Slist is defined recursively—it's a value and another Slist. And that means in many cases, a recursive definition will be simpler than an iterative one. For example, if we didn't use the iterator protocol (and itertools.islice) above, inserted would look like this:
    def inserted(self, index, value):
        snode, dnode = self, None
        result = None
        while True:
            if not index:
                node = Slist(value, None)
            else:
                if not snode:
                    raise IndexError('Slist index out of range')
                node = Slist(snode.value, None)
                snode = snode.tail
            if dnode:
                dnode.tail = node
            else:
                result = node
            index -= 1
Ouch. But a recursive definition looks like this:
    def copy(self):
        if self.tail is None:
            return Slist(self.value, None)
        return Slist(self.head, self.next.copy())
    def inserted(self, index, value):
        if not index:
            return Slist(value, self.copy())
        if self.tail is None:
            raise IndexError('Slist index out of range')
        return Slist(self.head, self.tail.insert(index-1, value))

Tail sharing

What if, instead of copying the list to insert a new element, we just shared the tail after the insertion point? Not only does everything get a lot simpler, this also can potentially save a lot of memory.
    def inserted(self, index, value):
        if not index:
            return Slist(value, self)
        if self.next is None:
            raise IndexError('Slist index out of range')
        return Slist(self.value, self.next.insert(index-1, value))
    def extended(self, slist):
        if self.tail is None:
            return SList(self.head, slist)
        else:
            return SList(self.head, self.tail.extended(slist))
    # etc.

Mutation

This tail sharing means that if we use any mutating methods on lists, like calling __setitem__ or assigning to head or tail, things can quickly get confusing. In fact, even without explicit tail-sharing methods, we can see this easily:
>>> l1 = Slist([0, 1, 2, 3])
>>> l2 = l1.tail
>>> l1
Slist([0, 1, 2, 3])
>>> l2.head = 100
>>> l1
Slist([0, 100, 2, 3])
This is effectively the same problem that Python has with binding the same list (or other mutable data type) to two different variables and then mutating one, but it can be harder to see when you're actually only sharing part or the list instead of the whole thing, and when you're using higher-level abstractions like splicing rather than just copying the tail around manually.

Persistence

In the ML/Haskell world, this doesn't come up because all data structures are immutable—if you can't mutate the shared tail, you can't tell whether tails are shared, so it's always safe to do so. But in the Scheme/Racket/Clojure world, where everything is mutable, they solve the problem by making a distinction beyond mutable vs. immutable, to persistent vs. ephemeral. I won't get into the details here (follow the link if you're interested), but basically, any code that uses only persistent operations on its persistent values is effectively referentially transparent, even if it's not purely immutable. Of course this means the stdlib needs naming conventions and/or some other way to make it clear to the reader which functions are non-mutating, which are mutating but safe for ephemeral objects, and which are always destructive, much like Python's reversed vs. list.reverse.

Cons lists

In traditional Lisp, the only compound data structure is the cons (named for the assembly instruction originally used to implement it, which is short for "construct"), which is just a pair of values (traditionally called car and cdr, for the assembly instructions used to implement them, short for "contents of address register" and "contents of data register"). Obviously, modern Lisps and derivatives have plenty of other data structures—array-based vectors, hash-based dictionaries, etc.—but the cons is still fundamental to programming in the whole family of languages.

So if Lisp stands for LISt Processing language, where are the lists?

Simple: Just define an empty list as nil (Lisp's name for None), and a non-empty list as a cons of the first element and the rest of the list, and implement functions like insert in the obvious way, and you have a tail-sharing, handle-less, singly-linked list. Of course you can still have conses that aren't lists—a cons is a list iff its tail is a list—but otherwise, this is the exact same thing we've just defined as Slist.

Lisp adds a bit of syntactic sugar to make this nicer: nil can be written as (); cons(1, cons(2, cons(3, nil))) as (1 2 3); cons(1, 2) as (1 . 2); cons(1, cons(2, 3)) as (1 2 . 3). But under the covers (and not very deep under the covers), a list is always a cons, or nil.

Triggers

Iterators

One of the great things about Python is the iterator protocol, a way to create iterables that only generate new items on demand, and don't store them anywhere. As usual, I'll defer to David Beazley's Generator Tricks for Systems Programmers for a detailed illustration of why this is so cool, but the basic appeal is simple: You can represent a very large iterable without actually storing (or computing up-front) anything but the current value, and whatever state is needed to calculate the next one. For a trivial example:
def count(start=0):
    i = start
    while True:
        yield i
        i += 1

def take(n, it):
    return [next(it) for _ in range(n)]

numbers = count()
odds = filter(lambda x: x%2, numbers)
first_10_odds = take(10, odds)
Here we've created an infinite iterable (all the nonnegative integers), filtered it to remove the even numbers, and taken the first 10. Obviously, if we tried to do this by creating a list of all of the nonnegative integers, it would take infinite time. We could create a finite list of just the nonnegative integers that we're going to need—in this case, range(20) would do, because 19 is the 10th non-negative integer—but in most non-toy examples, it's not as easy (often not even possible) to work that out. For example, a program to compute the first million primes can't know how many integers to filter, unless you already know the first million primes (in which case you don't need the program); a program to pull out the first 20 HTTP packets from a pcap stream can't possibly know how many total packets it will need to look at to find 20 HTTP packets; etc.

Of course I've been both using and implementing the iterator protocol above, just because it makes everything so much simpler and more compact that it's hard not to… But many languages don't have any support for such a thing.

Doing it manually

In such languages, we can get the same benefit, without the syntactic sugar of the iterator protocol (and, sadly, without the semantic abstraction of generators) by using triggers.

In fact, if you look under the covers of the iterator protocol, an iterator is just an object with a __next__ method that returns the next value (and updates its state to be ready to return the one after that).

Triggered lists

You can do something similar with a linked list: instead of storing a value and another linked list, we can store a value and a function that returns another linked list:
class count:
    def __init__(self, start=0):
        self.head = start
    @property
    def tail(self):
        return count(self.head+1)
Or, more generally:
def Tlist:
    def __init__(self, head, trigger):
        self.head, self._trigger = head, trigger
    @property
    def tail(self):
        return Tlist(self._trigger(self.head), self._trigger)

def count(start=0):
    return Tlist(start, lambda value: value+1)
Of course many languages don't have the equivalent of @property either, so you need an explicit function or method call to get the next value—everywhere your code used to say t.next, it has to say t.next()—but that's just minor syntactic sugar.

In many cases, you need more state than the current value (or no state at all), so it makes more sense to take a trigger as a function of no variables—which can be a closure over the needed state. Also, to allow finite lazy lists, you need some way for the trigger to signal that it's done—by raising StopIteration, by returning a new Tlist or None rather than just a value, etc. I'll do all of the options differently in the next section, for illustration.

Lazy lists

A lazy list is just a triggered list that hides the triggers under the covers—providing the same API as an Slist on top of a Tlist. Thanks to @property, we've almost got that. However, you don't want to have to recalculate the list over and over again; once a node is calculated, the trigger should be replaced by the actual node. Not all languages make it possible to build such a thing, and some have it built in, but for those in the middle, like Python, it's pretty simple:
def Llist:
    def __init__(self, head, trigger):
        self.head, self._trigger, self._tail = head, trigger, None
    @classmethod
    def fromiter(cls, iterable):
        i = iter(iterable)
        return Llist(None, iter(iterable).__next__).tail
    @property
    def tail(self):
        if self._trigger is not None:
            try:
                self._tail = Llist(self._trigger(), self._trigger)
            except StopIteration:
                self._tail = None
            self._trigger = None
        return self._tail
Just as with Slist above, we can expand this into a full Python Sequence by adding the appropriate methods:
    def __iter__(self):
        yield self.head
        if self.tail is not None:
            yield from self.tail
    def __getitem__(self, index):
        if isinstance(index, slice):
            return [self[i] for i in range(slice.start, slice.stop, slice.step)]
        elif not index:
            return self.head
        elif index < 0:
            raise IndexError('Llist cannot take negative indexes')
        elif self.tail is None:
            raise IndexError('Llist index out of range')
        else:
            return self.tail[index-1]
    # etc.
Of course I cheated by implementing __iter__ as a generator, but in a language without generators, you generally don't have to implement anything like __iter__ (Swift being the notable exception). It's not hard to turn that into an explicit class or closure if you need to, just annoyingly verbose:
    def __iter__(self):
        class Iterator:
            def __init__(self, node):
                self.node = node
            def __iter__(self):
                return self
            def __next__(self):
                if self.node is None:
                    raise StopIteration
                result = self.node.head
                self.node = self.node.tail
                return result
        return Iterator(self)

You may want to implement methods like __repr__ specially, to not iterate the entire list, because that would defeat the purpose of being lazy. For example:
    def _lazyiter(self):
        yield self.head
        if self._trigger is not None:
            yield ...
        else:
            yield from self._tail
    def __repr__(self):
        return 'Llist({})'.format(list(self._lazyiter()))
This is basically what languages like Haskell have built in: an automatically lazy, single-linked, tail-sharing cons list.

Laziness

There are two important things to observe here.

First, calling __getitem__ (or any other function that takes an index) automatically instantiates all nodes up to the index, while leaving the rest unevaluated. So, it's evaluating nodes exactly as fast as necessary, and no faster.

Second, if you just iterate over the Llist, without keeping a reference to the Llist itself around, every time you generate the next node, the previous one becomes garbage and can be cleaned up. So, just like an iterator, only a single value is alive at a time, automatically. (But, just like a list, if you _want_ to keep all the values alive, and iterate over them multiple times, you can do that, too, just by keeping the Llist around.) And if you use it in a typical recursive linked-list algorithm instead of a Python iterator loop, it's again going to free up each node as soon as you get the next one. So, a lazy linked list gives you all the advantages of an iterator, while still being a (linked) list.

Linked lists in Python

There's nothing stopping you from using lazy cons lists, other linked list types, or related types in Python. But they don't fit into the language and stdlib (and third-party libraries) nearly as well as sequences and iterators. The entire Python ecosystem is designed around the iterator, iterable, and sequence protocol, and in particular around looping over iterables, rather than around recursing over recursive data structures.

As Guido explains in his post on tail recursion elimination, this is intentional.

So, is this a limitation in Python? Not really. The iterator protocol and other Pythonic idioms bring most of the benefits of recursive programming to iterative programming. An iterator is the same kind of quasi-mutable data structure as a tail-sharing linked list: using it mutates an ephemeral object you don't care about, but if you write everything else immutably, you still get referential transparency. Stack up a bunch of iterator-transforming expressions, and at the end you get an iterator over some new virtual sequence, without affecting the original sequence, without wasting memory, without computing everything up-front, and with the ability to handle infinite sequences (as long as you truncate the iterator in some way at some point, of course). It's a different way of getting the same benefits, but the fact that Python is a bad language for writing idiomatic Haskell is no more of a problem than the fact that it's a bad language for writing idiomatic Java; it's a great language for writing idiomatic Python.

So, if you want to use lazy cons lists, go ahead… but don't expect to get much support from the stdlib, third-party libraries, etc.
1

View comments

Hybrid Programming
Hybrid Programming
5
Greenlets vs. explicit coroutines
Greenlets vs. explicit coroutines
6
ABCs: What are they good for?
ABCs: What are they good for?
1
A standard assembly format for Python bytecode
A standard assembly format for Python bytecode
6
Unified call syntax
Unified call syntax
8
Why heapq isn't a type
Why heapq isn't a type
1
Unpacked Bytecode
Unpacked Bytecode
3
Everything is dynamic
Everything is dynamic
1
Wordcode
Wordcode
1
For-each loops should define a new variable
For-each loops should define a new variable
4
Views instead of iterators
Views instead of iterators
2
How lookup _could_ work
How lookup _could_ work
2
How lookup works
How lookup works
7
How functions work
How functions work
2
Why you can't have exact decimal math
Why you can't have exact decimal math
2
Can you customize method resolution order?
Can you customize method resolution order?
1
Prototype inheritance is inheritance
Prototype inheritance is inheritance
1
Pattern matching again
Pattern matching again
The best collections library design?
The best collections library design?
1
Leaks into the Enclosing Scope
Leaks into the Enclosing Scope
2
Iterable Terminology
Iterable Terminology
8
Creating a new sequence type is easy
Creating a new sequence type is easy
2
Going faster with NumPy
Going faster with NumPy
2
Why isn't asyncio too slow?
Why isn't asyncio too slow?
Hacking Python without hacking Python
Hacking Python without hacking Python
1
How to detect a valid integer literal
How to detect a valid integer literal
2
Operator sectioning for Python
Operator sectioning for Python
1
If you don't like exceptions, you don't like Python
If you don't like exceptions, you don't like Python
2
Spam, spam, spam, gouda, spam, and tulips
Spam, spam, spam, gouda, spam, and tulips
And now for something completely stupid…
And now for something completely stupid…
How not to overuse lambda
How not to overuse lambda
1
Why following idioms matters
Why following idioms matters
1
Cloning generators
Cloning generators
5
What belongs in the stdlib?
What belongs in the stdlib?
3
Augmented Assignments (a += b)
Augmented Assignments (a += b)
11
Statements and Expressions
Statements and Expressions
3
An Abbreviated Table of binary64 Values
An Abbreviated Table of binary64 Values
1
IEEE Floats and Python
IEEE Floats and Python
Subtyping and Ducks
Subtyping and Ducks
1
Greenlets, threads, and processes
Greenlets, threads, and processes
6
Why don't you want getters and setters?
Why don't you want getters and setters?
8
The (Updated) Truth About Unicode in Python
The (Updated) Truth About Unicode in Python
1
How do I make a recursive function iterative?
How do I make a recursive function iterative?
1
Sockets and multiprocessing
Sockets and multiprocessing
Micro-optimization and Python
Micro-optimization and Python
3
Why does my 100MB file take 1GB of memory?
Why does my 100MB file take 1GB of memory?
1
How to edit a file in-place
How to edit a file in-place
ADTs for Python
ADTs for Python
5
A pattern-matching case statement for Python
A pattern-matching case statement for Python
2
How strongly typed is Python?
How strongly typed is Python?
How do comprehensions work?
How do comprehensions work?
1
Reverse dictionary lookup and more, on beyond z
Reverse dictionary lookup and more, on beyond z
2
How to handle exceptions
How to handle exceptions
2
Three ways to read files
Three ways to read files
2
Lazy Python lists
Lazy Python lists
2
Lazy cons lists
Lazy cons lists
1
Lazy tuple unpacking
Lazy tuple unpacking
3
Getting atomic writes right
Getting atomic writes right
Suites, scopes, and lifetimes
Suites, scopes, and lifetimes
1
Swift-style map and filter views
Swift-style map and filter views
1
Inline (bytecode) assembly
Inline (bytecode) assembly
Why Python (or any decent language) doesn't need blocks
Why Python (or any decent language) doesn't need blocks
18
SortedContainers
SortedContainers
1
Fixing lambda
Fixing lambda
2
Arguments and parameters, under the covers
Arguments and parameters, under the covers
pip, extension modules, and distro packages
pip, extension modules, and distro packages
Python doesn't have encapsulation?
Python doesn't have encapsulation?
3
Grouping into runs of adjacent values
Grouping into runs of adjacent values
dbm: not just for Unix
dbm: not just for Unix
How to use your self
How to use your self
1
Tkinter validation
Tkinter validation
7
What's the deal with ttk.Frame.__init__(self, parent)
What's the deal with ttk.Frame.__init__(self, parent)
1
Does Python pass by value, or by reference?
Does Python pass by value, or by reference?
9
"if not exists" definitions
"if not exists" definitions
repr + eval = bad idea
repr + eval = bad idea
1
Solving callbacks for Python GUIs
Solving callbacks for Python GUIs
Why your GUI app freezes
Why your GUI app freezes
21
Using python.org binary installations with Xcode 5
Using python.org binary installations with Xcode 5
defaultdict vs. setdefault
defaultdict vs. setdefault
1
Lazy restartable iteration
Lazy restartable iteration
2
Arguments and parameters
Arguments and parameters
3
How grouper works
How grouper works
1
Comprehensions vs. map
Comprehensions vs. map
2
Basic thread pools
Basic thread pools
Sorted collections in the stdlib
Sorted collections in the stdlib
4
Mac environment variables
Mac environment variables
Syntactic takewhile?
Syntactic takewhile?
4
Can you optimize list(genexp)
Can you optimize list(genexp)
MISRA-C and Python
MISRA-C and Python
1
How to split your program in two
How to split your program in two
How methods work
How methods work
3
readlines considered silly
readlines considered silly
6
Comprehensions for dummies
Comprehensions for dummies
Sockets are byte streams, not message streams
Sockets are byte streams, not message streams
9
Why you don't want to dynamically create variables
Why you don't want to dynamically create variables
7
Why eval/exec is bad
Why eval/exec is bad
Iterator Pipelines
Iterator Pipelines
2
Why are non-mutating algorithms simpler to write in Python?
Why are non-mutating algorithms simpler to write in Python?
2
Sticking with Apple's Python 2.7
Sticking with Apple's Python 2.7
Blog Archive
About Me
About Me
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.