Along the way to an attempt to port itertools from Python to Apple's new language Swift (see my Stupid Swift Ideas blog), I discovered something interesting: Swift's map and filter functions are in some ways better than Python's. Would the same ideas be worth having in Python? What about for other functions, like the ones in itertools?

Terminology

Swift has the same key types and methods as Python, but uses different names for all of them: Generator for Iterator, Sequence for Iterable, Collection for Sequence, generate() for __iter__(), next() for __next__(), generatorOf() for iter(), etc.

For the most part, everything you know about Python translates over directly. A Python iterator is an iterable that returns itself on __iter__ and is one-pass (looping over it consumes it); a Python sequence is an iterable that can be indexed, does not return itself, and is multi-pass; the exact same things are true for the Swift equivalents.

From here on out, I'll use Python terminology for everything.

map

In Python, map(function, iterable) returns a map object, which is a kind of iterator.

Because it's an iterator, it doesn't generate new values until you ask for them, which means you can map over an large sequence without needing to waste memory storing all the mapped values, or wait until all of the values have been mapped before using them. You can even map over infinite iterators:

    squares = map(lambda x: x*x, itertools.count())

If you don't get why iterators are cool, David Beazley explains it better than I ever could.

On the other hand, a map object is single-pass, it's not so nice for debugging, and it can't be indexed. These are all things people who migrate from Python 2.x (where map returns a list) complain about.

Of course you can get the Python 2.x behavior trivially:

    def map2(*args, **kwargs):
        return list(map(*args, **kwargs))

But Swift actually has something better.

When you call map on an iterator, you get back an iterator. But when you call it on a sequence, you get back a sequence. And not a list, but a special map view sequence. If m = map(func, seq), m[3] gives you func(seq[3]). There may or may not be a permanent or LRU cache on it, I'm not sure.

Anyway, this is pretty easy to get as well:

    class MapSequenceView(collections.abc.Sequence):
        def __init__(self, func, seq): self.func, self.seq = func, seq
        def __getitem__(self, idx): return self.func(self.seq[idx])
        def __len__(self): return len(self.seq)

    def swiftmap(func, seq):
        if isinstance(func, collections.ast.Sequence):
            return MapView(func, seq)
        else:
            return map(func, seq)

This isn't complete code. For example, when __getitem__ is called with a slice, you'll want to call func on each element, not on the whole sliced list. (And if you want it cached, you'll want to use lru_cache or memoize on the individual values only, not on slices, meaning you'll want __getitem__ to delegate to a _getsingle that can be decorated. But I'm assuming we don't want it cached.) But it's not hard to flesh out. (See the documentation on collections.abc.Sequence for exactly what you need to implement to be a proper sequence.)

If you just use the MapSequenceView as an iterable, it works exactly the same as a map, with no extra cost. But if you then want to iterate it again, you can. Or if you want to iterate it in reverse, or jump to index #17, you can, again with no extra cost.

Of course the fact that it's not an iterator means it's not a drop-in replacement. You can't dropwhile a MapSequenceView to get a map starting at the first positive element, or islice it or even next it to skip a value—or, rather, you can, but you'll get a new iterator, leaving the original MapSequenceView unchanged. But that's OK; you wouldn't use this as a drop-in replacement, you'd use this when you want the new behavior. And ideally, dropwhile, islice, etc. should all be built the same way, with their own DropWhileSequenceView, etc. types.

Interestingly, some Swift collections, when sliced, return special FooSlice types that provide a view onto the sliced part of the original collection. But I don't think that's at all necessary, given that an ISliceSequenceView over a MapSequenceView would provide the exact same benefits as a special MapSequenceViewSlice type.

Also, the repr of a Swift map sequence view isn't the mapped sequence, it's essentially the constructor call that shows how to rebuild the same view. I'm not sure whether that's a good thing or a bad thing. (What might be cool is to show something like [f(0), f(1), f(2), f(3)] for map(f, [0, 1, 2, 3])…)

filter

The filter function is even more interesting. When given an iterator, it returns an iterator, but when given a sequence, it returns a sequence. How does that work? How can you know in advance that index #3 in the filtered sequence is index #8 in the original?

You can't. This is where another interesting feature of Swift comes in.

In Swift, sequences aren't necessarily indexed by integers; they're indexed by any kind of object you want, and there's a standard hierarchy of index types: ForwardIndex can't do anything but increment, BidirectionalIndex can also decrement; RandomAccessIndex can also do arithmetic; and Integer is what it sounds like. This idea is borrowed from C++, where the standard library has a bunch of collections that have begin and end methods (which Swift uses a different name for, but I'll ignore that for consistency) that return indexes (which C++ calls iterators, but I'll ignore that again) of the appropriate type, and all of its functions work on those indexes while knowing nothing of the collection itself.

And Swift makes things even more interesting by having the equivalent of range(start, end) for all kinds of indexes rather than just integers, and slice(start, end) as well (in fact, range and slice are the same thing in Swift, because sadly it doesn't support negative indexing). And enumerate returns indexes of the appropriate type too.

So, what good is all that? Well, a FilterSequenceView can return a BidirectionalIndex from its begin and end methods, rather than 0 and len(self), and therefore enumerate returns BidirectionalIndex values. Which means you can look ahead or back from the current position while looping:

    for i, value in enumerate(f):
        if value == "spam":
            if f[i.pred()] == "eggs":
                return "complete breakfast"

It also means the reversed function works on any bidirectional sequence, like a FilterSequenceView, because it's implemented as a ReversedSequenceView that wraps the other sequence and returns indexes that work backward.

Of course this requires ints to have succ and pred methods, so the same code will work whether f is random-access or just bidirectional, but that's a pretty minor thing to add.

(I'm cheating a bit here; while Swift has this capability, at least as of beta 2, its standard library doesn't actually use it. The filter function returns a forward-only sequence, which can't be reversed. Also, both bidirectional and forward sequences are basically useless for anything but iteration because the protocol provides no begin and end methods. Each sequence in the stdlib happens to provide startIndex and endIndex properties, but because Swift has statically-typed generics instead of duck typing, that doesn't help if all you know is that you have a sequence…)

Other methods

Sadly, Swift has almost no other functions that create or transform iterables—beyond map and filter, there's enumerate and partial implementations of zip and repeat. But the same idea can be carried forward to all of the functions that Python has in itertools and elsewhere. For the functions in itertools and builtin:
  • map, zip, count, cycle, repeat, islice, starmap, zip_longest, and product can create return randomly-accessible sequences, bidirectional sequences, forward sequences, or iterators, depending on the weakest of their arguments.
  • filter, chain, compress, filterfalse, dropwhile, takewhile, and permutation and friends can return bidirectional sequences, forward sequences, or iterators, depending on the weakest argument.
  • accumulate and groupby can return a forward sequence or iterator, depending on the weakest argument. (If the function had an inverse, accumulate could return a bidirectional sequence, but you can't even guarantee that for the default + operator for all types…)
The recipes are mostly pretty obvious, since they're just wrappers around other functions. However, some of them have their own complex implementations—e.g., grouper, roundrobin, and partition could return a random-access sequence, bidirectional sequence, forward sequence, or iterator depending on the arguments.

Implementation

It's worth noting that any itertools-like function that wants to return a sequence has to be implemented as a custom class, not a generator. In some cases, a single class can provide the best-index-possible behavior, but in other cases, you'd need as many as four separate classes (or three classes and one generator).

If we added a tower of index types that a single sequence class could use without having to care what kind of index is has, then those three classes could, I think, always be reduced to one.

However, in some cases, knowing that you have a random-access sequence allows for a more efficient implementation. (This is actually why the C++ standard originally came up with its iterator tower.)

It should be pretty easy to factor out the details for a delegating sequence, so the only ones that will be difficult to write are the ones that rely on complex implicit state. (It's worth noting that Swift apparently does not have any such factored-out helpers, and implements the same thing repeatedly, with a whole lot of boilerplate. But I think that's down to limitations in the language, and to the fact that fleshing out and polishing the stdlib hasn't been the highest priority, and the fact that they only have a handful of functions so refactoring would only provide so much benefit…)

See seqview on GitHub for a proof of concept.

An even crazier idea

It might be nice to have syntactic support for a "sequence comprehension" that returns the best kind of sequence it can for a given input (the same as the input type if there are no if clauses, no better than bidirectional if there are), 

Conclusion

I have no idea which parts of this, if any, are good ideas, or at least good enough to be worth implementing, but I can break them down into separate ideas worth considering:
  • map and similar functions returning sequence views if given a sequence
    • all such functions in itertools being updated
    • helper to make it easier to define such sequence-or-iterator functions in Python
      • similar helper for C?
  • bidirectional, and maybe forward, sequences
    • single sequence type with standard index types, instead of new sequence types
    • making reversed work on bidirectional sequences
    • filter and similar functions returning sequence views if given a sequence
      • all such functions in itertools being updated
    • helper function to make it easier to define random-or-bidirectional-or-iterator functions in Python
      • similar helper for C?
  • sequence comprehensions
However, it's worth remembering that the main use for all of these functions is in iterator-pipeline-style programming, where you're going to be mixing generator expressions and custom generator functions in with your itertools calls. As soon as you have one generator anywhere in the pipeline, the whole thing is an iterator. Plus, except for debugging purposes, such code usually has no reason to restart, index, or reverse an iterator anyway. So, this may all just be idle speculation.
1

View comments

It's been more than a decade since Typical Programmer Greg Jorgensen taught the word about Abject-Oriented Programming.

Much of what he said still applies, but other things have changed. Languages in the Abject-Oriented space have been borrowing ideas from another paradigm entirely—and then everyone realized that languages like Python, Ruby, and JavaScript had been doing it for years and just hadn't noticed (because these languages do not require you to declare what you're doing, or even to know what you're doing). Meanwhile, new hybrid languages borrow freely from both paradigms.

This other paradigm—which is actually older, but was largely constrained to university basements until recent years—is called Functional Addiction.

A Functional Addict is someone who regularly gets higher-order—sometimes they may even exhibit dependent types—but still manages to retain a job.

Retaining a job is of course the goal of all programming. This is why some of these new hybrid languages, like Rust, check all borrowing, from both paradigms, so extensively that you can make regular progress for months without ever successfully compiling your code, and your managers will appreciate that progress. After all, once it does compile, it will definitely work.

Closures

It's long been known that Closures are dual to Encapsulation.

As Abject-Oriented Programming explained, Encapsulation involves making all of your variables public, and ideally global, to let the rest of the code decide what should and shouldn't be private.

Closures, by contrast, are a way of referring to variables from outer scopes. And there is no scope more outer than global.

Immutability

One of the reasons Functional Addiction has become popular in recent years is that to truly take advantage of multi-core systems, you need immutable data, sometimes also called persistent data.

Instead of mutating a function to fix a bug, you should always make a new copy of that function. For example:

function getCustName(custID)
{
    custRec = readFromDB("customer", custID);
    fullname = custRec[1] + ' ' + custRec[2];
    return fullname;
}

When you discover that you actually wanted fields 2 and 3 rather than 1 and 2, it might be tempting to mutate the state of this function. But doing so is dangerous. The right answer is to make a copy, and then try to remember to use the copy instead of the original:

function getCustName(custID)
{
    custRec = readFromDB("customer", custID);
    fullname = custRec[1] + ' ' + custRec[2];
    return fullname;
}

function getCustName2(custID)
{
    custRec = readFromDB("customer", custID);
    fullname = custRec[2] + ' ' + custRec[3];
    return fullname;
}

This means anyone still using the original function can continue to reference the old code, but as soon as it's no longer needed, it will be automatically garbage collected. (Automatic garbage collection isn't free, but it can be outsourced cheaply.)

Higher-Order Functions

In traditional Abject-Oriented Programming, you are required to give each function a name. But over time, the name of the function may drift away from what it actually does, making it as misleading as comments. Experience has shown that people will only keep once copy of their information up to date, and the CHANGES.TXT file is the right place for that.

Higher-Order Functions can solve this problem:

function []Functions = [
    lambda(custID) {
        custRec = readFromDB("customer", custID);
        fullname = custRec[1] + ' ' + custRec[2];
        return fullname;
    },
    lambda(custID) {
        custRec = readFromDB("customer", custID);
        fullname = custRec[2] + ' ' + custRec[3];
        return fullname;
    },
]

Now you can refer to this functions by order, so there's no need for names.

Parametric Polymorphism

Traditional languages offer Abject-Oriented Polymorphism and Ad-Hoc Polymorphism (also known as Overloading), but better languages also offer Parametric Polymorphism.

The key to Parametric Polymorphism is that the type of the output can be determined from the type of the inputs via Algebra. For example:

function getCustData(custId, x)
{
    if (x == int(x)) {
        custRec = readFromDB("customer", custId);
        fullname = custRec[1] + ' ' + custRec[2];
        return int(fullname);
    } else if (x.real == 0) {
        custRec = readFromDB("customer", custId);
        fullname = custRec[1] + ' ' + custRec[2];
        return double(fullname);
    } else {
        custRec = readFromDB("customer", custId);
        fullname = custRec[1] + ' ' + custRec[2];
        return complex(fullname);
    }
}

Notice that we've called the variable x. This is how you know you're using Algebraic Data Types. The names y, z, and sometimes w are also Algebraic.

Type Inference

Languages that enable Functional Addiction often feature Type Inference. This means that the compiler can infer your typing without you having to be explicit:


function getCustName(custID)
{
    // WARNING: Make sure the DB is locked here or
    custRec = readFromDB("customer", custID);
    fullname = custRec[1] + ' ' + custRec[2];
    return fullname;
}

We didn't specify what will happen if the DB is not locked. And that's fine, because the compiler will figure it out and insert code that corrupts the data, without us needing to tell it to!

By contrast, most Abject-Oriented languages are either nominally typed—meaning that you give names to all of your types instead of meanings—or dynamically typed—meaning that your variables are all unique individuals that can accomplish anything if they try.

Memoization

Memoization means caching the results of a function call:

function getCustName(custID)
{
    if (custID == 3) { return "John Smith"; }
    custRec = readFromDB("customer", custID);
    fullname = custRec[1] + ' ' + custRec[2];
    return fullname;
}

Non-Strictness

Non-Strictness is often confused with Laziness, but in fact Laziness is just one kind of Non-Strictness. Here's an example that compares two different forms of Non-Strictness:

/****************************************
*
* TO DO:
*
* get tax rate for the customer state
* eventually from some table
*
****************************************/
// function lazyTaxRate(custId) {}

function callByNameTextRate(custId)
{
    /****************************************
    *
    * TO DO:
    *
    * get tax rate for the customer state
    * eventually from some table
    *
    ****************************************/
}

Both are Non-Strict, but the second one forces the compiler to actually compile the function just so we can Call it By Name. This causes code bloat. The Lazy version will be smaller and faster. Plus, Lazy programming allows us to create infinite recursion without making the program hang:

/****************************************
*
* TO DO:
*
* get tax rate for the customer state
* eventually from some table
*
****************************************/
// function lazyTaxRateRecursive(custId) { lazyTaxRateRecursive(custId); }

Laziness is often combined with Memoization:

function getCustName(custID)
{
    // if (custID == 3) { return "John Smith"; }
    custRec = readFromDB("customer", custID);
    fullname = custRec[1] + ' ' + custRec[2];
    return fullname;
}

Outside the world of Functional Addicts, this same technique is often called Test-Driven Development. If enough tests can be embedded in the code to achieve 100% coverage, or at least a decent amount, your code is guaranteed to be safe. But because the tests are not compiled and executed in the normal run, or indeed ever, they don't affect performance or correctness.

Conclusion

Many people claim that the days of Abject-Oriented Programming are over. But this is pure hype. Functional Addiction and Abject Orientation are not actually at odds with each other, but instead complement each other.
5

View comments

Blog Archive
About Me
About Me
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.