Python doesn't have a way to clone generators.

At least for a lot of simple cases, however, it's pretty obvious what cloning them should do, and being able to do so would be handy. But for a lot of other cases, it's not at all obvious.

For example, if the generator is using another iterator (one which isn't a generator itself; otherwise the answer would be obvious), like a file, what should happen when you clone it? Does it share the same file iterator? Or get a new iterator that references the same file handle under the covers? Or one that references a dup of the file handle?

So, let's look at what it would take to clone a generator, and what the choices are, and how you'd implement them.

What is a generator?

Under the covers, a generator is just storing the current state of a function—much like a thread does.

In particular, it's just a stack frame, plus a copy of the frame's running flag and code object (so these can be accessed after the generator finishes and the frame goes away). This isn't really specified anywhere in the docs, but the docstring for inspect.isgenerator includes this:
    Generator objects provide these attributes:
        __iter__        defined to support iteration over container
        close           raises a new GeneratorExit exception inside the
                        generator to terminate the iteration
        gi_code         code object
        gi_frame        frame object or possibly None once the generator has
                        been exhausted
        gi_running      set to 1 when generator is executing, 0 otherwise
        next            return the next item from the container
        send            resumes the generator and "sends" a value that becomes
                        the result of the current yield-expression
        throw           used to raise an exception inside the generator

The frame type actually is better documented. Its attributes are listed in a chart in the inspect module docs. But briefly, a frame is a code object; its locals/nonlocals/globals/builtins environment; a next instruction pointer; its exception state; a pointer back up the stack, and some debugging stuff.

So, if you think about it, cloning a generator should just mean:

  • Construct a new frame object with a copy of gi_frame.f_locals, and all the other members exactly the same as gi_frame. (Everything but locals is either immutable, like the code object, or something you'd clearly want to share, like the globals.)
  • Construct a new generator object with that new frame object.
But there are a few problems here:
  • The f_locals is the locals dict—but the actual locals environment, not exposed to Python, is an C array of references. Fortunately, you can call the C functions PyFrame_LocalsToFast and PyFrame_FastToLocals to go back and forth between the two, if you don't mind getting your hands dirty with ctypes.pythonapi.
  • The f_locals includes closure variables—and, obviously, not as closure cells, just as their values. So the FastToLocals shuffle is going to unbind any closure variables and turn them into new locals in the clone. If you want to keep them as closures, you have to go under the covers and deal with the variable arrays directly. Or you can just not allow cloning generators whose functions have any free variables (or cell variables, locals referenced by free variables in functions defined locally within them).
  • The exception state is not actually exposed to Python anywhere.
  • The exception state also includes a borrowed (non-refcounted) reference back to the owning generator, if any. You can't create a borrowed reference from Python, which means that if you clone a generator's frame from Python, you'd be creating a reference cycle and perma-leaking the cloned objects.
And then there's one big giant problem:

Unlike, say, types.CodeType, types.FrameType can't be used to construct a frame object with all its bits. Try it, and you just get "TypeError: cannot create 'frame' instances".

And if you go under the covers to ctypes PyFrame_NewFrame, notice that its first argument is a PyThreadState—a type that isn't exposed to Python at all.

So, this is going to take a lot of C API hacking. You essentially have to replace everything PyFrame_NewFrame does. And a lot of that can't be done from Python. Most obviously, setting up the free and cell variables (but again, you could just punt on that), but many things that could be doable from Python are going to require banging on the C structs because they aren't exposed.

Once you get past that, types.GeneratorType also can't be constructed, but this time, it really is just a simple matter of ctypes; PyGen_New just takes a frame object.

I'm curious enough to go off and try to build a frame constructor callable from Python (punting on the closures) to see if this works; I'll edit this post once I've tried it. But it's not going to be trivial.

View comments

  1. Replies
    1. Well, attempted black magic, but I failed. :)

  2. I just want to thank you for sharing your information and your site or blog this is simple but nice Information I’ve ever seen i like it i learn something today. Reference Generator

  3. I assume you never made progress on this? If it existed, it could be used to build things like the Giry Monad for statistics calculations, which seems like it would be a useful thing to have.

Hybrid Programming
Hybrid Programming
Greenlets vs. explicit coroutines
Greenlets vs. explicit coroutines
ABCs: What are they good for?
ABCs: What are they good for?
A standard assembly format for Python bytecode
A standard assembly format for Python bytecode
Unified call syntax
Unified call syntax
Why heapq isn't a type
Why heapq isn't a type
Unpacked Bytecode
Unpacked Bytecode
Everything is dynamic
Everything is dynamic
For-each loops should define a new variable
For-each loops should define a new variable
Views instead of iterators
Views instead of iterators
How lookup _could_ work
How lookup _could_ work
How lookup works
How lookup works
How functions work
How functions work
Why you can't have exact decimal math
Why you can't have exact decimal math
Can you customize method resolution order?
Can you customize method resolution order?
Prototype inheritance is inheritance
Prototype inheritance is inheritance
Pattern matching again
Pattern matching again
The best collections library design?
The best collections library design?
Leaks into the Enclosing Scope
Leaks into the Enclosing Scope
Iterable Terminology
Iterable Terminology
Creating a new sequence type is easy
Creating a new sequence type is easy
Going faster with NumPy
Going faster with NumPy
Why isn't asyncio too slow?
Why isn't asyncio too slow?
Hacking Python without hacking Python
Hacking Python without hacking Python
How to detect a valid integer literal
How to detect a valid integer literal
Operator sectioning for Python
Operator sectioning for Python
If you don't like exceptions, you don't like Python
If you don't like exceptions, you don't like Python
Spam, spam, spam, gouda, spam, and tulips
Spam, spam, spam, gouda, spam, and tulips
And now for something completely stupid…
And now for something completely stupid…
How not to overuse lambda
How not to overuse lambda
Why following idioms matters
Why following idioms matters
Cloning generators
Cloning generators
What belongs in the stdlib?
What belongs in the stdlib?
Augmented Assignments (a += b)
Augmented Assignments (a += b)
Statements and Expressions
Statements and Expressions
An Abbreviated Table of binary64 Values
An Abbreviated Table of binary64 Values
IEEE Floats and Python
IEEE Floats and Python
Subtyping and Ducks
Subtyping and Ducks
Greenlets, threads, and processes
Greenlets, threads, and processes
Why don't you want getters and setters?
Why don't you want getters and setters?
The (Updated) Truth About Unicode in Python
The (Updated) Truth About Unicode in Python
How do I make a recursive function iterative?
How do I make a recursive function iterative?
Sockets and multiprocessing
Sockets and multiprocessing
Micro-optimization and Python
Micro-optimization and Python
Why does my 100MB file take 1GB of memory?
Why does my 100MB file take 1GB of memory?
How to edit a file in-place
How to edit a file in-place
ADTs for Python
ADTs for Python
A pattern-matching case statement for Python
A pattern-matching case statement for Python
How strongly typed is Python?
How strongly typed is Python?
How do comprehensions work?
How do comprehensions work?
Reverse dictionary lookup and more, on beyond z
Reverse dictionary lookup and more, on beyond z
How to handle exceptions
How to handle exceptions
Three ways to read files
Three ways to read files
Lazy Python lists
Lazy Python lists
Lazy cons lists
Lazy cons lists
Lazy tuple unpacking
Lazy tuple unpacking
Getting atomic writes right
Getting atomic writes right
Suites, scopes, and lifetimes
Suites, scopes, and lifetimes
Swift-style map and filter views
Swift-style map and filter views
Inline (bytecode) assembly
Inline (bytecode) assembly
Why Python (or any decent language) doesn't need blocks
Why Python (or any decent language) doesn't need blocks
Fixing lambda
Fixing lambda
Arguments and parameters, under the covers
Arguments and parameters, under the covers
pip, extension modules, and distro packages
pip, extension modules, and distro packages
Python doesn't have encapsulation?
Python doesn't have encapsulation?
Grouping into runs of adjacent values
Grouping into runs of adjacent values
dbm: not just for Unix
dbm: not just for Unix
How to use your self
How to use your self
Tkinter validation
Tkinter validation
What's the deal with ttk.Frame.__init__(self, parent)
What's the deal with ttk.Frame.__init__(self, parent)
Does Python pass by value, or by reference?
Does Python pass by value, or by reference?
"if not exists" definitions
"if not exists" definitions
repr + eval = bad idea
repr + eval = bad idea
Solving callbacks for Python GUIs
Solving callbacks for Python GUIs
Why your GUI app freezes
Why your GUI app freezes
Using binary installations with Xcode 5
Using binary installations with Xcode 5
defaultdict vs. setdefault
defaultdict vs. setdefault
Lazy restartable iteration
Lazy restartable iteration
Arguments and parameters
Arguments and parameters
How grouper works
How grouper works
Comprehensions vs. map
Comprehensions vs. map
Basic thread pools
Basic thread pools
Sorted collections in the stdlib
Sorted collections in the stdlib
Mac environment variables
Mac environment variables
Syntactic takewhile?
Syntactic takewhile?
Can you optimize list(genexp)
Can you optimize list(genexp)
MISRA-C and Python
MISRA-C and Python
How to split your program in two
How to split your program in two
How methods work
How methods work
readlines considered silly
readlines considered silly
Comprehensions for dummies
Comprehensions for dummies
Sockets are byte streams, not message streams
Sockets are byte streams, not message streams
Why you don't want to dynamically create variables
Why you don't want to dynamically create variables
Why eval/exec is bad
Why eval/exec is bad
Iterator Pipelines
Iterator Pipelines
Why are non-mutating algorithms simpler to write in Python?
Why are non-mutating algorithms simpler to write in Python?
Sticking with Apple's Python 2.7
Sticking with Apple's Python 2.7
Blog Archive
About Me
About Me
Dynamic Views theme. Powered by Blogger. Report Abuse.