Along the way to an attempt to port itertools from Python to Apple's new language Swift (see my Stupid Swift Ideas blog), I discovered something interesting: Swift's map and filter functions are in some ways better than Python's. Would the same ideas be worth having in Python? What about for other functions, like the ones in itertools?

Terminology

Swift has the same key types and methods as Python, but uses different names for all of them: Generator for Iterator, Sequence for Iterable, Collection for Sequence, generate() for __iter__(), next() for __next__(), generatorOf() for iter(), etc.

For the most part, everything you know about Python translates over directly. A Python iterator is an iterable that returns itself on __iter__ and is one-pass (looping over it consumes it); a Python sequence is an iterable that can be indexed, does not return itself, and is multi-pass; the exact same things are true for the Swift equivalents.

From here on out, I'll use Python terminology for everything.

map

In Python, map(function, iterable) returns a map object, which is a kind of iterator.

Because it's an iterator, it doesn't generate new values until you ask for them, which means you can map over an large sequence without needing to waste memory storing all the mapped values, or wait until all of the values have been mapped before using them. You can even map over infinite iterators:

    squares = map(lambda x: x*x, itertools.count())

If you don't get why iterators are cool, David Beazley explains it better than I ever could.

On the other hand, a map object is single-pass, it's not so nice for debugging, and it can't be indexed. These are all things people who migrate from Python 2.x (where map returns a list) complain about.

Of course you can get the Python 2.x behavior trivially:

    def map2(*args, **kwargs):
        return list(map(*args, **kwargs))

But Swift actually has something better.

When you call map on an iterator, you get back an iterator. But when you call it on a sequence, you get back a sequence. And not a list, but a special map view sequence. If m = map(func, seq), m[3] gives you func(seq[3]). There may or may not be a permanent or LRU cache on it, I'm not sure.

Anyway, this is pretty easy to get as well:

    class MapSequenceView(collections.abc.Sequence):
        def __init__(self, func, seq): self.func, self.seq = func, seq
        def __getitem__(self, idx): return self.func(self.seq[idx])
        def __len__(self): return len(self.seq)

    def swiftmap(func, seq):
        if isinstance(func, collections.ast.Sequence):
            return MapView(func, seq)
        else:
            return map(func, seq)

This isn't complete code. For example, when __getitem__ is called with a slice, you'll want to call func on each element, not on the whole sliced list. (And if you want it cached, you'll want to use lru_cache or memoize on the individual values only, not on slices, meaning you'll want __getitem__ to delegate to a _getsingle that can be decorated. But I'm assuming we don't want it cached.) But it's not hard to flesh out. (See the documentation on collections.abc.Sequence for exactly what you need to implement to be a proper sequence.)

If you just use the MapSequenceView as an iterable, it works exactly the same as a map, with no extra cost. But if you then want to iterate it again, you can. Or if you want to iterate it in reverse, or jump to index #17, you can, again with no extra cost.

Of course the fact that it's not an iterator means it's not a drop-in replacement. You can't dropwhile a MapSequenceView to get a map starting at the first positive element, or islice it or even next it to skip a value—or, rather, you can, but you'll get a new iterator, leaving the original MapSequenceView unchanged. But that's OK; you wouldn't use this as a drop-in replacement, you'd use this when you want the new behavior. And ideally, dropwhile, islice, etc. should all be built the same way, with their own DropWhileSequenceView, etc. types.

Interestingly, some Swift collections, when sliced, return special FooSlice types that provide a view onto the sliced part of the original collection. But I don't think that's at all necessary, given that an ISliceSequenceView over a MapSequenceView would provide the exact same benefits as a special MapSequenceViewSlice type.

Also, the repr of a Swift map sequence view isn't the mapped sequence, it's essentially the constructor call that shows how to rebuild the same view. I'm not sure whether that's a good thing or a bad thing. (What might be cool is to show something like [f(0), f(1), f(2), f(3)] for map(f, [0, 1, 2, 3])…)

filter

The filter function is even more interesting. When given an iterator, it returns an iterator, but when given a sequence, it returns a sequence. How does that work? How can you know in advance that index #3 in the filtered sequence is index #8 in the original?

You can't. This is where another interesting feature of Swift comes in.

In Swift, sequences aren't necessarily indexed by integers; they're indexed by any kind of object you want, and there's a standard hierarchy of index types: ForwardIndex can't do anything but increment, BidirectionalIndex can also decrement; RandomAccessIndex can also do arithmetic; and Integer is what it sounds like. This idea is borrowed from C++, where the standard library has a bunch of collections that have begin and end methods (which Swift uses a different name for, but I'll ignore that for consistency) that return indexes (which C++ calls iterators, but I'll ignore that again) of the appropriate type, and all of its functions work on those indexes while knowing nothing of the collection itself.

And Swift makes things even more interesting by having the equivalent of range(start, end) for all kinds of indexes rather than just integers, and slice(start, end) as well (in fact, range and slice are the same thing in Swift, because sadly it doesn't support negative indexing). And enumerate returns indexes of the appropriate type too.

So, what good is all that? Well, a FilterSequenceView can return a BidirectionalIndex from its begin and end methods, rather than 0 and len(self), and therefore enumerate returns BidirectionalIndex values. Which means you can look ahead or back from the current position while looping:

    for i, value in enumerate(f):
        if value == "spam":
            if f[i.pred()] == "eggs":
                return "complete breakfast"

It also means the reversed function works on any bidirectional sequence, like a FilterSequenceView, because it's implemented as a ReversedSequenceView that wraps the other sequence and returns indexes that work backward.

Of course this requires ints to have succ and pred methods, so the same code will work whether f is random-access or just bidirectional, but that's a pretty minor thing to add.

(I'm cheating a bit here; while Swift has this capability, at least as of beta 2, its standard library doesn't actually use it. The filter function returns a forward-only sequence, which can't be reversed. Also, both bidirectional and forward sequences are basically useless for anything but iteration because the protocol provides no begin and end methods. Each sequence in the stdlib happens to provide startIndex and endIndex properties, but because Swift has statically-typed generics instead of duck typing, that doesn't help if all you know is that you have a sequence…)

Other methods

Sadly, Swift has almost no other functions that create or transform iterables—beyond map and filter, there's enumerate and partial implementations of zip and repeat. But the same idea can be carried forward to all of the functions that Python has in itertools and elsewhere. For the functions in itertools and builtin:
  • map, zip, count, cycle, repeat, islice, starmap, zip_longest, and product can create return randomly-accessible sequences, bidirectional sequences, forward sequences, or iterators, depending on the weakest of their arguments.
  • filter, chain, compress, filterfalse, dropwhile, takewhile, and permutation and friends can return bidirectional sequences, forward sequences, or iterators, depending on the weakest argument.
  • accumulate and groupby can return a forward sequence or iterator, depending on the weakest argument. (If the function had an inverse, accumulate could return a bidirectional sequence, but you can't even guarantee that for the default + operator for all types…)
The recipes are mostly pretty obvious, since they're just wrappers around other functions. However, some of them have their own complex implementations—e.g., grouper, roundrobin, and partition could return a random-access sequence, bidirectional sequence, forward sequence, or iterator depending on the arguments.

Implementation

It's worth noting that any itertools-like function that wants to return a sequence has to be implemented as a custom class, not a generator. In some cases, a single class can provide the best-index-possible behavior, but in other cases, you'd need as many as four separate classes (or three classes and one generator).

If we added a tower of index types that a single sequence class could use without having to care what kind of index is has, then those three classes could, I think, always be reduced to one.

However, in some cases, knowing that you have a random-access sequence allows for a more efficient implementation. (This is actually why the C++ standard originally came up with its iterator tower.)

It should be pretty easy to factor out the details for a delegating sequence, so the only ones that will be difficult to write are the ones that rely on complex implicit state. (It's worth noting that Swift apparently does not have any such factored-out helpers, and implements the same thing repeatedly, with a whole lot of boilerplate. But I think that's down to limitations in the language, and to the fact that fleshing out and polishing the stdlib hasn't been the highest priority, and the fact that they only have a handful of functions so refactoring would only provide so much benefit…)

See seqview on GitHub for a proof of concept.

An even crazier idea

It might be nice to have syntactic support for a "sequence comprehension" that returns the best kind of sequence it can for a given input (the same as the input type if there are no if clauses, no better than bidirectional if there are), 

Conclusion

I have no idea which parts of this, if any, are good ideas, or at least good enough to be worth implementing, but I can break them down into separate ideas worth considering:
  • map and similar functions returning sequence views if given a sequence
    • all such functions in itertools being updated
    • helper to make it easier to define such sequence-or-iterator functions in Python
      • similar helper for C?
  • bidirectional, and maybe forward, sequences
    • single sequence type with standard index types, instead of new sequence types
    • making reversed work on bidirectional sequences
    • filter and similar functions returning sequence views if given a sequence
      • all such functions in itertools being updated
    • helper function to make it easier to define random-or-bidirectional-or-iterator functions in Python
      • similar helper for C?
  • sequence comprehensions
However, it's worth remembering that the main use for all of these functions is in iterator-pipeline-style programming, where you're going to be mixing generator expressions and custom generator functions in with your itertools calls. As soon as you have one generator anywhere in the pipeline, the whole thing is an iterator. Plus, except for debugging purposes, such code usually has no reason to restart, index, or reverse an iterator anyway. So, this may all just be idle speculation.
1

View comments

Hybrid Programming
Hybrid Programming
5
Greenlets vs. explicit coroutines
Greenlets vs. explicit coroutines
6
ABCs: What are they good for?
ABCs: What are they good for?
1
A standard assembly format for Python bytecode
A standard assembly format for Python bytecode
6
Unified call syntax
Unified call syntax
8
Why heapq isn't a type
Why heapq isn't a type
1
Unpacked Bytecode
Unpacked Bytecode
3
Everything is dynamic
Everything is dynamic
1
Wordcode
Wordcode
1
For-each loops should define a new variable
For-each loops should define a new variable
4
Views instead of iterators
Views instead of iterators
2
How lookup _could_ work
How lookup _could_ work
2
How lookup works
How lookup works
7
How functions work
How functions work
2
Why you can't have exact decimal math
Why you can't have exact decimal math
2
Can you customize method resolution order?
Can you customize method resolution order?
1
Prototype inheritance is inheritance
Prototype inheritance is inheritance
1
Pattern matching again
Pattern matching again
The best collections library design?
The best collections library design?
1
Leaks into the Enclosing Scope
Leaks into the Enclosing Scope
2
Iterable Terminology
Iterable Terminology
8
Creating a new sequence type is easy
Creating a new sequence type is easy
2
Going faster with NumPy
Going faster with NumPy
2
Why isn't asyncio too slow?
Why isn't asyncio too slow?
Hacking Python without hacking Python
Hacking Python without hacking Python
1
How to detect a valid integer literal
How to detect a valid integer literal
2
Operator sectioning for Python
Operator sectioning for Python
1
If you don't like exceptions, you don't like Python
If you don't like exceptions, you don't like Python
2
Spam, spam, spam, gouda, spam, and tulips
Spam, spam, spam, gouda, spam, and tulips
And now for something completely stupid…
And now for something completely stupid…
How not to overuse lambda
How not to overuse lambda
1
Why following idioms matters
Why following idioms matters
1
Cloning generators
Cloning generators
5
What belongs in the stdlib?
What belongs in the stdlib?
3
Augmented Assignments (a += b)
Augmented Assignments (a += b)
11
Statements and Expressions
Statements and Expressions
3
An Abbreviated Table of binary64 Values
An Abbreviated Table of binary64 Values
1
IEEE Floats and Python
IEEE Floats and Python
Subtyping and Ducks
Subtyping and Ducks
1
Greenlets, threads, and processes
Greenlets, threads, and processes
6
Why don't you want getters and setters?
Why don't you want getters and setters?
8
The (Updated) Truth About Unicode in Python
The (Updated) Truth About Unicode in Python
1
How do I make a recursive function iterative?
How do I make a recursive function iterative?
1
Sockets and multiprocessing
Sockets and multiprocessing
Micro-optimization and Python
Micro-optimization and Python
3
Why does my 100MB file take 1GB of memory?
Why does my 100MB file take 1GB of memory?
1
How to edit a file in-place
How to edit a file in-place
ADTs for Python
ADTs for Python
5
A pattern-matching case statement for Python
A pattern-matching case statement for Python
2
How strongly typed is Python?
How strongly typed is Python?
How do comprehensions work?
How do comprehensions work?
1
Reverse dictionary lookup and more, on beyond z
Reverse dictionary lookup and more, on beyond z
2
How to handle exceptions
How to handle exceptions
2
Three ways to read files
Three ways to read files
2
Lazy Python lists
Lazy Python lists
2
Lazy cons lists
Lazy cons lists
1
Lazy tuple unpacking
Lazy tuple unpacking
3
Getting atomic writes right
Getting atomic writes right
Suites, scopes, and lifetimes
Suites, scopes, and lifetimes
1
Swift-style map and filter views
Swift-style map and filter views
1
Inline (bytecode) assembly
Inline (bytecode) assembly
Why Python (or any decent language) doesn't need blocks
Why Python (or any decent language) doesn't need blocks
18
SortedContainers
SortedContainers
1
Fixing lambda
Fixing lambda
2
Arguments and parameters, under the covers
Arguments and parameters, under the covers
pip, extension modules, and distro packages
pip, extension modules, and distro packages
Python doesn't have encapsulation?
Python doesn't have encapsulation?
3
Grouping into runs of adjacent values
Grouping into runs of adjacent values
dbm: not just for Unix
dbm: not just for Unix
How to use your self
How to use your self
1
Tkinter validation
Tkinter validation
7
What's the deal with ttk.Frame.__init__(self, parent)
What's the deal with ttk.Frame.__init__(self, parent)
1
Does Python pass by value, or by reference?
Does Python pass by value, or by reference?
9
"if not exists" definitions
"if not exists" definitions
repr + eval = bad idea
repr + eval = bad idea
1
Solving callbacks for Python GUIs
Solving callbacks for Python GUIs
Why your GUI app freezes
Why your GUI app freezes
21
Using python.org binary installations with Xcode 5
Using python.org binary installations with Xcode 5
defaultdict vs. setdefault
defaultdict vs. setdefault
1
Lazy restartable iteration
Lazy restartable iteration
2
Arguments and parameters
Arguments and parameters
3
How grouper works
How grouper works
1
Comprehensions vs. map
Comprehensions vs. map
2
Basic thread pools
Basic thread pools
Sorted collections in the stdlib
Sorted collections in the stdlib
4
Mac environment variables
Mac environment variables
Syntactic takewhile?
Syntactic takewhile?
4
Can you optimize list(genexp)
Can you optimize list(genexp)
MISRA-C and Python
MISRA-C and Python
1
How to split your program in two
How to split your program in two
How methods work
How methods work
3
readlines considered silly
readlines considered silly
6
Comprehensions for dummies
Comprehensions for dummies
Sockets are byte streams, not message streams
Sockets are byte streams, not message streams
9
Why you don't want to dynamically create variables
Why you don't want to dynamically create variables
7
Why eval/exec is bad
Why eval/exec is bad
Iterator Pipelines
Iterator Pipelines
2
Why are non-mutating algorithms simpler to write in Python?
Why are non-mutating algorithms simpler to write in Python?
2
Sticking with Apple's Python 2.7
Sticking with Apple's Python 2.7
Blog Archive
About Me
About Me
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.