In database apps, you often want to create tables, views, and indices only if they don't already exist, so they do the setup work the first time, but don't blow away all of your data every subsequent time. So SQL has a special "IF NOT EXISTS" clause you can add to the various CREATE statements.

Occasionally, you want to do the same thing in Python. For example, this StackOverflow user likes to re-run the same script over and over in the interactive session (e.g., by hitting F5 in the IDE). That's kind of an odd thing to do in general, but it's not hard to imagine cases where it makes sense. For example, you might be expanding or debugging part of the script, and want to use the rest of the script while you do so.

Normally, that wouldn't be a problem, but what if the script and modified created some global variables, or class or function attributes, etc., and you didn't want those to be overwritten?
That might sound like an anti-pattern, but imagine that you have a function that you've memoized with functools.lru_cache, and it's cached hundreds of expensive values. If you replace it with a new copy of the function, it'll have an empty cache.

Of course the right thing to do is to factor out the script into separate modules, and have the script import the stable code instead of just including it directly. But you don't always want to take a break from actively hacking on code to refactor it.

The easy (but ugly) way

You can always do this:
    @lru_cache()
    def _square(x):
        return x*x
    try:
        square
    except NameError:
        square = _square
And if you only have to do it to one function, maybe that's the best answer. But if you have to do it to a dozen functions, that'll get ugly, and all that repetition is an invitation to copy and paste and screw it up somewhere. So, what you want to do is factor it out into a function.

But how? What you want is something like this:
    create_if_not_exists(square, _square)
In a language like C++, you'd do that by taking a reference to a function variable as the first parameter, but you can't have a reference to the square variable into the function, because that doesn't make any sense in Python; variables aren't things you can take references of.

You might be able to use some horrible frame hacks to pass the value in and have the function figure out the name from the calling frame, but this is already hacky enough. You might be able to do it with MacroPy, but there are probably cooler ways you can solve the original problem once you're using macros.

Strings as names

The key thing to notice is that ultimately, a variable name is just a string that gets looked up in the appropriate scope. Any frame hack, macro, etc. would just be getting the name as a string and setting its value by name anyway, so why not make that explicit?

This is one of those examples that shows that, while usually you don't want to dynamically create variables, occasionally you do.

So, how do you do it?

There are three options.

  • Use exec to declare the variable global or nonlocal and then reassign it.
  • Call setattr on the enclosing scope object.
  • Use the globals dict.

First, using exec for reflection is almost always the wrong answer, so let's just rule that out off the bat.

The setattr solution is more flexible, but in this case I think that's actually a negative. The whole point of what we're trying to do is to modify the global scope by (re-)executing a script. If it doesn't work when you instead execfile the script in the middle of a function… good!

The way to create a global variable dynamically is:
    def create(name, value):
        globals()[name] = value

The "if not exists" part


Of course create('square', _square) does the exact same thing as just square = _square. We wanted to only bind square if it doesn't exist, not rebind it no matter what.

Once you think of it as dict value assignment, the answer is obvious:
    def create_if_not_exists(name, value):
        globals().setdefault(name, value)
And that's the whole trick.

Decorators

Except it's not the whole trick; there's one more thing we can do: Turn it into a decorator.
    def create_if_not_exists(name):
        def wrap(func):
            globals().setdefault(name, func)
            return func
        return wrap

Getting the name for free

Remember when I said that you can't get the name of a variable? Well, a decorator is only going to be called on functions or classes, and almost only ever going to be called on a function or class created with the def or class statement (or the result of decorating such a thing), which means it will have a name built in, as its __name__ attribute.

The problem is, this is the name of the actual implementation function, _square, not the name we want to bind it to, square. But those two names should give you an idea: If you just make that underscore prefix a naming convention (and it already fits in with existing Python naming conventions pretty well), you _can_ get the name to bind to. So:
    def create_if_not_exists(func):
        assert func.__name__[0] == '_'
        globals().setdefault(func.__name__[1:], func)
        return func
And now, all we need to do to prevent the new (decorated) function from overwriting the old one is to attach this decorator:
    @create_if_not_exists
    @lru_cache()
    def _square(x):
        return x*x
And now, we really are done. How much simpler can you get than that?

When you want to rebind square

Maybe square was supposed to be part of the safe, static code that you don't want to blow away with every re-run, but then you found a bug in it. How do you load the new version?

Simple: either del square before hitting F5, or square = _square after hitting F5.

A brief discursion on recursion

Renaming functions after they're created doesn't play well with recursive functions. In Python, recursive functions call themselves by global lookup on their own name. So, if you write this:
    @create_if_not_exists
    @lru_cache()
    def _fact(n):
        if n < 2: return 1
        return n * _fact(-1)
… your original _fact function is recursively calling whatever happens to be named _fact at run time. Which means that, after a re-run, it's going to be calling the new _fact function, with its new and separate cache, which makes the whole cache thing worthless.

The answer is simple: Call yourself by your public name, not your private name.
    @create_if_not_exists
    @lru_cache()
    def _fact(n):
        if n < 2: return 1
        return n * fact(-1)
Now, your original _fact function, which you've also bound to fact (and, for that matter, the new _fact that you haven't bound to fact) will call fact, which is still the original function. Tada.
0

Add a comment

It's been more than a decade since Typical Programmer Greg Jorgensen taught the word about Abject-Oriented Programming.

Much of what he said still applies, but other things have changed. Languages in the Abject-Oriented space have been borrowing ideas from another paradigm entirely—and then everyone realized that languages like Python, Ruby, and JavaScript had been doing it for years and just hadn't noticed (because these languages do not require you to declare what you're doing, or even to know what you're doing). Meanwhile, new hybrid languages borrow freely from both paradigms.

This other paradigm—which is actually older, but was largely constrained to university basements until recent years—is called Functional Addiction.

A Functional Addict is someone who regularly gets higher-order—sometimes they may even exhibit dependent types—but still manages to retain a job.

Retaining a job is of course the goal of all programming. This is why some of these new hybrid languages, like Rust, check all borrowing, from both paradigms, so extensively that you can make regular progress for months without ever successfully compiling your code, and your managers will appreciate that progress. After all, once it does compile, it will definitely work.

Closures

It's long been known that Closures are dual to Encapsulation.

As Abject-Oriented Programming explained, Encapsulation involves making all of your variables public, and ideally global, to let the rest of the code decide what should and shouldn't be private.

Closures, by contrast, are a way of referring to variables from outer scopes. And there is no scope more outer than global.

Immutability

One of the reasons Functional Addiction has become popular in recent years is that to truly take advantage of multi-core systems, you need immutable data, sometimes also called persistent data.

Instead of mutating a function to fix a bug, you should always make a new copy of that function. For example:

function getCustName(custID)
{
    custRec = readFromDB("customer", custID);
    fullname = custRec[1] + ' ' + custRec[2];
    return fullname;
}

When you discover that you actually wanted fields 2 and 3 rather than 1 and 2, it might be tempting to mutate the state of this function. But doing so is dangerous. The right answer is to make a copy, and then try to remember to use the copy instead of the original:

function getCustName(custID)
{
    custRec = readFromDB("customer", custID);
    fullname = custRec[1] + ' ' + custRec[2];
    return fullname;
}

function getCustName2(custID)
{
    custRec = readFromDB("customer", custID);
    fullname = custRec[2] + ' ' + custRec[3];
    return fullname;
}

This means anyone still using the original function can continue to reference the old code, but as soon as it's no longer needed, it will be automatically garbage collected. (Automatic garbage collection isn't free, but it can be outsourced cheaply.)

Higher-Order Functions

In traditional Abject-Oriented Programming, you are required to give each function a name. But over time, the name of the function may drift away from what it actually does, making it as misleading as comments. Experience has shown that people will only keep once copy of their information up to date, and the CHANGES.TXT file is the right place for that.

Higher-Order Functions can solve this problem:

function []Functions = [
    lambda(custID) {
        custRec = readFromDB("customer", custID);
        fullname = custRec[1] + ' ' + custRec[2];
        return fullname;
    },
    lambda(custID) {
        custRec = readFromDB("customer", custID);
        fullname = custRec[2] + ' ' + custRec[3];
        return fullname;
    },
]

Now you can refer to this functions by order, so there's no need for names.

Parametric Polymorphism

Traditional languages offer Abject-Oriented Polymorphism and Ad-Hoc Polymorphism (also known as Overloading), but better languages also offer Parametric Polymorphism.

The key to Parametric Polymorphism is that the type of the output can be determined from the type of the inputs via Algebra. For example:

function getCustData(custId, x)
{
    if (x == int(x)) {
        custRec = readFromDB("customer", custId);
        fullname = custRec[1] + ' ' + custRec[2];
        return int(fullname);
    } else if (x.real == 0) {
        custRec = readFromDB("customer", custId);
        fullname = custRec[1] + ' ' + custRec[2];
        return double(fullname);
    } else {
        custRec = readFromDB("customer", custId);
        fullname = custRec[1] + ' ' + custRec[2];
        return complex(fullname);
    }
}

Notice that we've called the variable x. This is how you know you're using Algebraic Data Types. The names y, z, and sometimes w are also Algebraic.

Type Inference

Languages that enable Functional Addiction often feature Type Inference. This means that the compiler can infer your typing without you having to be explicit:


function getCustName(custID)
{
    // WARNING: Make sure the DB is locked here or
    custRec = readFromDB("customer", custID);
    fullname = custRec[1] + ' ' + custRec[2];
    return fullname;
}

We didn't specify what will happen if the DB is not locked. And that's fine, because the compiler will figure it out and insert code that corrupts the data, without us needing to tell it to!

By contrast, most Abject-Oriented languages are either nominally typed—meaning that you give names to all of your types instead of meanings—or dynamically typed—meaning that your variables are all unique individuals that can accomplish anything if they try.

Memoization

Memoization means caching the results of a function call:

function getCustName(custID)
{
    if (custID == 3) { return "John Smith"; }
    custRec = readFromDB("customer", custID);
    fullname = custRec[1] + ' ' + custRec[2];
    return fullname;
}

Non-Strictness

Non-Strictness is often confused with Laziness, but in fact Laziness is just one kind of Non-Strictness. Here's an example that compares two different forms of Non-Strictness:

/****************************************
*
* TO DO:
*
* get tax rate for the customer state
* eventually from some table
*
****************************************/
// function lazyTaxRate(custId) {}

function callByNameTextRate(custId)
{
    /****************************************
    *
    * TO DO:
    *
    * get tax rate for the customer state
    * eventually from some table
    *
    ****************************************/
}

Both are Non-Strict, but the second one forces the compiler to actually compile the function just so we can Call it By Name. This causes code bloat. The Lazy version will be smaller and faster. Plus, Lazy programming allows us to create infinite recursion without making the program hang:

/****************************************
*
* TO DO:
*
* get tax rate for the customer state
* eventually from some table
*
****************************************/
// function lazyTaxRateRecursive(custId) { lazyTaxRateRecursive(custId); }

Laziness is often combined with Memoization:

function getCustName(custID)
{
    // if (custID == 3) { return "John Smith"; }
    custRec = readFromDB("customer", custID);
    fullname = custRec[1] + ' ' + custRec[2];
    return fullname;
}

Outside the world of Functional Addicts, this same technique is often called Test-Driven Development. If enough tests can be embedded in the code to achieve 100% coverage, or at least a decent amount, your code is guaranteed to be safe. But because the tests are not compiled and executed in the normal run, or indeed ever, they don't affect performance or correctness.

Conclusion

Many people claim that the days of Abject-Oriented Programming are over. But this is pure hype. Functional Addiction and Abject Orientation are not actually at odds with each other, but instead complement each other.
5

View comments

Blog Archive
About Me
About Me
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.