Python doesn't have a way to clone generators.

At least for a lot of simple cases, however, it's pretty obvious what cloning them should do, and being able to do so would be handy. But for a lot of other cases, it's not at all obvious.

For example, if the generator is using another iterator (one which isn't a generator itself; otherwise the answer would be obvious), like a file, what should happen when you clone it? Does it share the same file iterator? Or get a new iterator that references the same file handle under the covers? Or one that references a dup of the file handle?

So, let's look at what it would take to clone a generator, and what the choices are, and how you'd implement them.

What is a generator?

Under the covers, a generator is just storing the current state of a function—much like a thread does.

In particular, it's just a stack frame, plus a copy of the frame's running flag and code object (so these can be accessed after the generator finishes and the frame goes away). This isn't really specified anywhere in the docs, but the docstring for inspect.isgenerator includes this:
    Generator objects provide these attributes:
        __iter__        defined to support iteration over container
        close           raises a new GeneratorExit exception inside the
                        generator to terminate the iteration
        gi_code         code object
        gi_frame        frame object or possibly None once the generator has
                        been exhausted
        gi_running      set to 1 when generator is executing, 0 otherwise
        next            return the next item from the container
        send            resumes the generator and "sends" a value that becomes
                        the result of the current yield-expression
        throw           used to raise an exception inside the generator

The frame type actually is better documented. Its attributes are listed in a chart in the inspect module docs. But briefly, a frame is a code object; its locals/nonlocals/globals/builtins environment; a next instruction pointer; its exception state; a pointer back up the stack, and some debugging stuff.

So, if you think about it, cloning a generator should just mean:

  • Construct a new frame object with a copy of gi_frame.f_locals, and all the other members exactly the same as gi_frame. (Everything but locals is either immutable, like the code object, or something you'd clearly want to share, like the globals.)
  • Construct a new generator object with that new frame object.
But there are a few problems here:
  • The f_locals is the locals dict—but the actual locals environment, not exposed to Python, is an C array of references. Fortunately, you can call the C functions PyFrame_LocalsToFast and PyFrame_FastToLocals to go back and forth between the two, if you don't mind getting your hands dirty with ctypes.pythonapi.
  • The f_locals includes closure variables—and, obviously, not as closure cells, just as their values. So the FastToLocals shuffle is going to unbind any closure variables and turn them into new locals in the clone. If you want to keep them as closures, you have to go under the covers and deal with the variable arrays directly. Or you can just not allow cloning generators whose functions have any free variables (or cell variables, locals referenced by free variables in functions defined locally within them).
  • The exception state is not actually exposed to Python anywhere.
  • The exception state also includes a borrowed (non-refcounted) reference back to the owning generator, if any. You can't create a borrowed reference from Python, which means that if you clone a generator's frame from Python, you'd be creating a reference cycle and perma-leaking the cloned objects.
And then there's one big giant problem:

Unlike, say, types.CodeType, types.FrameType can't be used to construct a frame object with all its bits. Try it, and you just get "TypeError: cannot create 'frame' instances".

And if you go under the covers to ctypes PyFrame_NewFrame, notice that its first argument is a PyThreadState—a type that isn't exposed to Python at all.

So, this is going to take a lot of C API hacking. You essentially have to replace everything PyFrame_NewFrame does. And a lot of that can't be done from Python. Most obviously, setting up the free and cell variables (but again, you could just punt on that), but many things that could be doable from Python are going to require banging on the C structs because they aren't exposed.


Once you get past that, types.GeneratorType also can't be constructed, but this time, it really is just a simple matter of ctypes; PyGen_New just takes a frame object.

I'm curious enough to go off and try to build a frame constructor callable from Python (punting on the closures) to see if this works; I'll edit this post once I've tried it. But it's not going to be trivial.
5

View comments

Blog Archive
About Me
About Me
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.