Stack Overflow is full of questions where the answer is to create a "multidict", a dict mapping each key to a list of values.

There are two ways to do this, using defaultdict, or using a regular dict with setdefault. And as soon as someone posts an answer using one or the other, someone else suggests they should have used the other in a comment, and sometimes it even devolves into an argument about which is better.

Compare these two functions:

    def f1(pairs):
        d = {}
        for key, value in pairs:
            d.setdefault(key, []).append(value)
        return d

    def f2(pairs):
        d = collections.defaultdict(list)
        for key, value in csv.reader(f):
            d[key].append(value)
        return d

It's hard to argue that either one is unclear, overly verbose, hard to understand, etc.

And, while one or the other is probably faster, it's probably not enough to make a difference in real-life programs.*

So, how do you decide between them?

The answer is simple: This isn't the relevant code for making the decision. You have to look at how the returned value is going to be used in the code. When you later look up a missing key, do you want an empty list, or a KeyError?


* From a quick test, setdefault is about 60% slower, so in the rare cases where it matters, if you want a plain dict, it might be worth using defaultdict anyway, then converting at the end.
1

View comments

Blog Archive
About Me
About Me
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.