At least for a lot of simple cases, however, it's pretty obvious what cloning them should do, and being able to do so would be handy. But for a lot of other cases, it's not at all obvious.
For example, if the generator is using another iterator (one which isn't a generator itself; otherwise the answer would be obvious), like a file, what should happen when you clone it? Does it share the same file iterator? Or get a new iterator that references the same file handle under the covers? Or one that references a dup of the file handle?
So, let's look at what it would take to clone a generator, and what the choices are, and how you'd implement them.
What is a generator?
Under the covers, a generator is just storing the current state of a function—much like a thread does.In particular, it's just a stack frame, plus a copy of the frame's running flag and code object (so these can be accessed after the generator finishes and the frame goes away). This isn't really specified anywhere in the docs, but the docstring for inspect.isgenerator includes this:
Generator objects provide these attributes: __iter__ defined to support iteration over container close raises a new GeneratorExit exception inside the generator to terminate the iteration gi_code code object gi_frame frame object or possibly None once the generator has been exhausted gi_running set to 1 when generator is executing, 0 otherwise next return the next item from the container send resumes the generator and "sends" a value that becomes the result of the current yield-expression throw used to raise an exception inside the generator
The frame type actually is better documented. Its attributes are listed in a chart in the inspect module docs. But briefly, a frame is a code object; its locals/nonlocals/globals/builtins environment; a next instruction pointer; its exception state; a pointer back up the stack, and some debugging stuff.
So, if you think about it, cloning a generator should just mean:
- Construct a new frame object with a copy of gi_frame.f_locals, and all the other members exactly the same as gi_frame. (Everything but locals is either immutable, like the code object, or something you'd clearly want to share, like the globals.)
- Construct a new generator object with that new frame object.
But there are a few problems here:
- The f_locals is the locals dict—but the actual locals environment, not exposed to Python, is an C array of references. Fortunately, you can call the C functions PyFrame_LocalsToFast and PyFrame_FastToLocals to go back and forth between the two, if you don't mind getting your hands dirty with ctypes.pythonapi.
- The f_locals includes closure variables—and, obviously, not as closure cells, just as their values. So the FastToLocals shuffle is going to unbind any closure variables and turn them into new locals in the clone. If you want to keep them as closures, you have to go under the covers and deal with the variable arrays directly. Or you can just not allow cloning generators whose functions have any free variables (or cell variables, locals referenced by free variables in functions defined locally within them).
- The exception state is not actually exposed to Python anywhere.
- The exception state also includes a borrowed (non-refcounted) reference back to the owning generator, if any. You can't create a borrowed reference from Python, which means that if you clone a generator's frame from Python, you'd be creating a reference cycle and perma-leaking the cloned objects.
And then there's one big giant problem:
And if you go under the covers to ctypes PyFrame_NewFrame, notice that its first argument is a PyThreadState—a type that isn't exposed to Python at all.
So, this is going to take a lot of C API hacking. You essentially have to replace everything PyFrame_NewFrame does. And a lot of that can't be done from Python. Most obviously, setting up the free and cell variables (but again, you could just punt on that), but many things that could be doable from Python are going to require banging on the C structs because they aren't exposed.
Once you get past that, types.GeneratorType also can't be constructed, but this time, it really is just a simple matter of ctypes; PyGen_New just takes a frame object.
I'm curious enough to go off and try to build a frame constructor callable from Python (punting on the closures) to see if this works; I'll edit this post once I've tried it. But it's not going to be trivial.
View comments