Along the way to an attempt to port itertools from Python to Apple's new language Swift (see my Stupid Swift Ideas blog), I discovered something interesting: Swift's map and filter functions are in some ways better than Python's. Would the same ideas be worth having in Python? What about for other functions, like the ones in itertools?

Terminology

Swift has the same key types and methods as Python, but uses different names for all of them: Generator for Iterator, Sequence for Iterable, Collection for Sequence, generate() for __iter__(), next() for __next__(), generatorOf() for iter(), etc.

For the most part, everything you know about Python translates over directly. A Python iterator is an iterable that returns itself on __iter__ and is one-pass (looping over it consumes it); a Python sequence is an iterable that can be indexed, does not return itself, and is multi-pass; the exact same things are true for the Swift equivalents.

From here on out, I'll use Python terminology for everything.

map

In Python, map(function, iterable) returns a map object, which is a kind of iterator.

Because it's an iterator, it doesn't generate new values until you ask for them, which means you can map over an large sequence without needing to waste memory storing all the mapped values, or wait until all of the values have been mapped before using them. You can even map over infinite iterators:

    squares = map(lambda x: x*x, itertools.count())

If you don't get why iterators are cool, David Beazley explains it better than I ever could.

On the other hand, a map object is single-pass, it's not so nice for debugging, and it can't be indexed. These are all things people who migrate from Python 2.x (where map returns a list) complain about.

Of course you can get the Python 2.x behavior trivially:

    def map2(*args, **kwargs):
        return list(map(*args, **kwargs))

But Swift actually has something better.

When you call map on an iterator, you get back an iterator. But when you call it on a sequence, you get back a sequence. And not a list, but a special map view sequence. If m = map(func, seq), m[3] gives you func(seq[3]). There may or may not be a permanent or LRU cache on it, I'm not sure.

Anyway, this is pretty easy to get as well:

    class MapSequenceView(collections.abc.Sequence):
        def __init__(self, func, seq): self.func, self.seq = func, seq
        def __getitem__(self, idx): return self.func(self.seq[idx])
        def __len__(self): return len(self.seq)

    def swiftmap(func, seq):
        if isinstance(func, collections.ast.Sequence):
            return MapView(func, seq)
        else:
            return map(func, seq)

This isn't complete code. For example, when __getitem__ is called with a slice, you'll want to call func on each element, not on the whole sliced list. (And if you want it cached, you'll want to use lru_cache or memoize on the individual values only, not on slices, meaning you'll want __getitem__ to delegate to a _getsingle that can be decorated. But I'm assuming we don't want it cached.) But it's not hard to flesh out. (See the documentation on collections.abc.Sequence for exactly what you need to implement to be a proper sequence.)

If you just use the MapSequenceView as an iterable, it works exactly the same as a map, with no extra cost. But if you then want to iterate it again, you can. Or if you want to iterate it in reverse, or jump to index #17, you can, again with no extra cost.

Of course the fact that it's not an iterator means it's not a drop-in replacement. You can't dropwhile a MapSequenceView to get a map starting at the first positive element, or islice it or even next it to skip a value—or, rather, you can, but you'll get a new iterator, leaving the original MapSequenceView unchanged. But that's OK; you wouldn't use this as a drop-in replacement, you'd use this when you want the new behavior. And ideally, dropwhile, islice, etc. should all be built the same way, with their own DropWhileSequenceView, etc. types.

Interestingly, some Swift collections, when sliced, return special FooSlice types that provide a view onto the sliced part of the original collection. But I don't think that's at all necessary, given that an ISliceSequenceView over a MapSequenceView would provide the exact same benefits as a special MapSequenceViewSlice type.

Also, the repr of a Swift map sequence view isn't the mapped sequence, it's essentially the constructor call that shows how to rebuild the same view. I'm not sure whether that's a good thing or a bad thing. (What might be cool is to show something like [f(0), f(1), f(2), f(3)] for map(f, [0, 1, 2, 3])…)

filter

The filter function is even more interesting. When given an iterator, it returns an iterator, but when given a sequence, it returns a sequence. How does that work? How can you know in advance that index #3 in the filtered sequence is index #8 in the original?

You can't. This is where another interesting feature of Swift comes in.

In Swift, sequences aren't necessarily indexed by integers; they're indexed by any kind of object you want, and there's a standard hierarchy of index types: ForwardIndex can't do anything but increment, BidirectionalIndex can also decrement; RandomAccessIndex can also do arithmetic; and Integer is what it sounds like. This idea is borrowed from C++, where the standard library has a bunch of collections that have begin and end methods (which Swift uses a different name for, but I'll ignore that for consistency) that return indexes (which C++ calls iterators, but I'll ignore that again) of the appropriate type, and all of its functions work on those indexes while knowing nothing of the collection itself.

And Swift makes things even more interesting by having the equivalent of range(start, end) for all kinds of indexes rather than just integers, and slice(start, end) as well (in fact, range and slice are the same thing in Swift, because sadly it doesn't support negative indexing). And enumerate returns indexes of the appropriate type too.

So, what good is all that? Well, a FilterSequenceView can return a BidirectionalIndex from its begin and end methods, rather than 0 and len(self), and therefore enumerate returns BidirectionalIndex values. Which means you can look ahead or back from the current position while looping:

    for i, value in enumerate(f):
        if value == "spam":
            if f[i.pred()] == "eggs":
                return "complete breakfast"

It also means the reversed function works on any bidirectional sequence, like a FilterSequenceView, because it's implemented as a ReversedSequenceView that wraps the other sequence and returns indexes that work backward.

Of course this requires ints to have succ and pred methods, so the same code will work whether f is random-access or just bidirectional, but that's a pretty minor thing to add.

(I'm cheating a bit here; while Swift has this capability, at least as of beta 2, its standard library doesn't actually use it. The filter function returns a forward-only sequence, which can't be reversed. Also, both bidirectional and forward sequences are basically useless for anything but iteration because the protocol provides no begin and end methods. Each sequence in the stdlib happens to provide startIndex and endIndex properties, but because Swift has statically-typed generics instead of duck typing, that doesn't help if all you know is that you have a sequence…)

Other methods

Sadly, Swift has almost no other functions that create or transform iterables—beyond map and filter, there's enumerate and partial implementations of zip and repeat. But the same idea can be carried forward to all of the functions that Python has in itertools and elsewhere. For the functions in itertools and builtin:
  • map, zip, count, cycle, repeat, islice, starmap, zip_longest, and product can create return randomly-accessible sequences, bidirectional sequences, forward sequences, or iterators, depending on the weakest of their arguments.
  • filter, chain, compress, filterfalse, dropwhile, takewhile, and permutation and friends can return bidirectional sequences, forward sequences, or iterators, depending on the weakest argument.
  • accumulate and groupby can return a forward sequence or iterator, depending on the weakest argument. (If the function had an inverse, accumulate could return a bidirectional sequence, but you can't even guarantee that for the default + operator for all types…)
The recipes are mostly pretty obvious, since they're just wrappers around other functions. However, some of them have their own complex implementations—e.g., grouper, roundrobin, and partition could return a random-access sequence, bidirectional sequence, forward sequence, or iterator depending on the arguments.

Implementation

It's worth noting that any itertools-like function that wants to return a sequence has to be implemented as a custom class, not a generator. In some cases, a single class can provide the best-index-possible behavior, but in other cases, you'd need as many as four separate classes (or three classes and one generator).

If we added a tower of index types that a single sequence class could use without having to care what kind of index is has, then those three classes could, I think, always be reduced to one.

However, in some cases, knowing that you have a random-access sequence allows for a more efficient implementation. (This is actually why the C++ standard originally came up with its iterator tower.)

It should be pretty easy to factor out the details for a delegating sequence, so the only ones that will be difficult to write are the ones that rely on complex implicit state. (It's worth noting that Swift apparently does not have any such factored-out helpers, and implements the same thing repeatedly, with a whole lot of boilerplate. But I think that's down to limitations in the language, and to the fact that fleshing out and polishing the stdlib hasn't been the highest priority, and the fact that they only have a handful of functions so refactoring would only provide so much benefit…)

See seqview on GitHub for a proof of concept.

An even crazier idea

It might be nice to have syntactic support for a "sequence comprehension" that returns the best kind of sequence it can for a given input (the same as the input type if there are no if clauses, no better than bidirectional if there are), 

Conclusion

I have no idea which parts of this, if any, are good ideas, or at least good enough to be worth implementing, but I can break them down into separate ideas worth considering:
  • map and similar functions returning sequence views if given a sequence
    • all such functions in itertools being updated
    • helper to make it easier to define such sequence-or-iterator functions in Python
      • similar helper for C?
  • bidirectional, and maybe forward, sequences
    • single sequence type with standard index types, instead of new sequence types
    • making reversed work on bidirectional sequences
    • filter and similar functions returning sequence views if given a sequence
      • all such functions in itertools being updated
    • helper function to make it easier to define random-or-bidirectional-or-iterator functions in Python
      • similar helper for C?
  • sequence comprehensions
However, it's worth remembering that the main use for all of these functions is in iterator-pipeline-style programming, where you're going to be mixing generator expressions and custom generator functions in with your itertools calls. As soon as you have one generator anywhere in the pipeline, the whole thing is an iterator. Plus, except for debugging purposes, such code usually has no reason to restart, index, or reverse an iterator anyway. So, this may all just be idle speculation.
1

View comments

It's been more than a decade since Typical Programmer Greg Jorgensen taught the word about Abject-Oriented Programming.

Much of what he said still applies, but other things have changed. Languages in the Abject-Oriented space have been borrowing ideas from another paradigm entirely—and then everyone realized that languages like Python, Ruby, and JavaScript had been doing it for years and just hadn't noticed (because these languages do not require you to declare what you're doing, or even to know what you're doing). Meanwhile, new hybrid languages borrow freely from both paradigms.

This other paradigm—which is actually older, but was largely constrained to university basements until recent years—is called Functional Addiction.
5

I haven't posted anything new in a couple years (partly because I attempted to move to a different blogging platform where I could write everything in markdown instead of HTML but got frustrated—which I may attempt again), but I've had a few private comments and emails on some of the old posts, so I decided to do some followups.

A couple years ago, I wrote a blog post on greenlets, threads, and processes.
6

Looking before you leap

Python is a duck-typed language, and one where you usually trust EAFP ("Easier to Ask Forgiveness than Permission") over LBYL ("Look Before You Leap"). In Java or C#, you need "interfaces" all over the place; you can't pass something to a function unless it's an instance of a type that implements that interface; in Python, as long as your object has the methods and other attributes that the function needs, no matter what type it is, everything is good.
1

Background

Currently, CPython’s internal bytecode format stores instructions with no args as 1 byte, instructions with small args as 3 bytes, and instructions with large args as 6 bytes (actually, a 3-byte EXTENDED_ARG followed by a 3-byte real instruction). While bytecode is implementation-specific, many other implementations (PyPy, MicroPython, …) use CPython’s bytecode format, or variations on it.

Python exposes as much of this as possible to user code.
6

If you want to skip all the tl;dr and cut to the chase, jump to Concrete Proposal.

Why can’t we write list.len()? Dunder methods C++ Python Locals What raises on failure? Method objects What about set and delete? Data members Namespaces Bytecode details Lookup overrides Introspection C API Concrete proposal CPython Analysis

Why can’t we write list.len()?

Python is an OO language. To reverse a list, you call lst.reverse(); to search a list for an element, you call lst.index().
8

Many people, when they first discover the heapq module, have two questions:

Why does it define a bunch of functions instead of a container type? Why don't those functions take a key or reverse parameter, like all the other sorting-related stuff in Python? Why not a type?

At the abstract level, it's often easier to think of heaps as an algorithm rather than a data structure.
1

Currently, in CPython, if you want to process bytecode, either in C or in Python, it’s pretty complicated.

The built-in peephole optimizer has to do extra work fixing up jump targets and the line-number table, and just punts on many cases because they’re too hard to deal with. PEP 511 proposes a mechanism for registering third-party (or possibly stdlib) optimizers, and they’ll all have to do the same kind of work.
3

One common "advanced question" on places like StackOverflow and python-list is "how do I dynamically create a function/method/class/whatever"? The standard answer is: first, some caveats about why you probably don't want to do that, and then an explanation of the various ways to do it when you really do need to.

But really, creating functions, methods, classes, etc. in Python is always already dynamic.

Some cases of "I need a dynamic function" are just "Yeah? And you've already got one".
1

A few years ago, Cesare di Mauro created a project called WPython, a fork of CPython 2.6.4 that “brings many optimizations and refactorings”. The starting point of the project was replacing the bytecode with “wordcode”. However, there were a number of other changes on top of it.

I believe it’s possible that replacing the bytecode with wordcode would be useful on its own.
1

Many languages have a for-each loop. In some, like Python, it’s the only kind of for loop:

for i in range(10): print(i) In most languages, the loop variable is only in scope within the code controlled by the for loop,[1] except in languages that don’t have granular scopes at all, like Python.[2]

So, is that i a variable that gets updated each time through the loop or is it a new constant that gets defined each time through the loop?

Almost every language treats it as a reused variable.
4
Blog Archive
About Me
About Me
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.