Background
Currently, CPython’s internal bytecode format stores instructions with no args as 1 byte, instructions with small args as 3 bytes, and instructions with large args as 6 bytes (actually, a 3-byte EXTENDED_ARG
followed by a 3-byte real instruction). While bytecode is implementation-specific, many other implementations (PyPy, MicroPython, …) use CPython’s bytecode format, or variations on it.
Python exposes as much of this as possible to user code. For example, you can write a decorator that takes a function’s __code__
member, access its co_code
bytecode string and other attributes, build a different code object, and replace the function’s __code__
. Or you can build an import hook that takes the top-level module code returned by compile
(and all the code objects stored in co_consts
for the top-level object or recursively in any of the nested objects), builds up a different one, and returns that as the module’s actual code.
And this isn’t just a nifty feature for playing around with Python’s internals without having to hack up the interpreter; there’s real-life production code that depends on doing this kind of stuff to code objects.
Making CPython bytecode format less brittle
There have been a few proposals to change the internal format: e.g., “wordcode” stores most instructions as 2 bytes (using EXTENDED_ARGS
for 4 or 6 bytes when needed), while “packed ops” steals bits from the opcode to store very small args for certain ops (e.g., LOAD_CONST_2
can be stored as 1 byte instead of 3). Such proposals would obviously break all code that depends on bytecode. Guido suggested on python-dev that it would need two Python versions (so 3.7, if we finished wordcode
today) before we could release such a breaking change; I’m not sure even that would be enough time.
Similarly, while some other implementations use essentially the same format as CPython, they’re not all actually compatible. For example, for compactness, MicroPython changes most non-jump instructions are 2 bytes instead of 3, which breaks almost any code that even inspects bytecode, much less tries to modify it.
One of the replies to Guido suggested, “Maybe we should have a standard portable assembly format for bytecode”. If we had that, it would remain consistent across most internal changes, even as radical as bytecode to wordcode, and would be portable between CPython and MicroPython. If we could get such a format into 3.6, I think it would definitely be reasonable for a radical change like wordcode to go into 3.7 as Guido suggests—and maybe even for any future changes to take place over a single version instead of two.
Making bytecode processors easier to write
Separately, as part of the discussion on PEP 511, which adds a new API for directly processing code objects, I suggested that it would be a lot simpler to write such processors, and also simpler for the compiler, if it used some “unpacked bytecode” format that’s easier to work with.
The main objection I heard to that idea was that any such format would make Python harder to change in the future—but once you consider that the obvious format to use is effectively a bytecode assembly language, it becomes clear that the opposite is true: we’re making Python easier to change in the future, along with making it simpler to work with.
The other objection is that there are just too many possible designs to choose from, so it’s way too early to put anything in the stdlib. But there really aren’t many possible designs to choose from.
dis
Python already has a standard bytecode assembly language format, as exposed by the dis
module: a dis.Bytecode
object represents the assembly-language version of a code
object, as an iterable of dis.Instruction
objects instead of just raw bytes (plus a few extra attributes and methods). It abstracts away things like variable instruction size (including EXTENDED_ARG
), mapping indices into co_const
into actual constant values, etc.
This is almost exactly what we want, for both purposes—but there are two problems with it.
First, there’s no assembler to get back from Bytecode
back to code
. Even if there were, Bytecode
is immutable. And, even if it were mutable, Instruction
objects have to be constructed out of a bunch of stuff that is either meaningless or needlessly difficult to deal with when you’re modifying or creating code. Most of that stuff is also useless to the assembler and could just be left at None
, but the big one is the argval
for each jump-like Instruction
: it’s an offset into the raw byte string, not something meaningful at all in the Bytecode
object. This means that if you add or remove any bytecodes, all the jumps have to be fixed up.
Second, the dis
module is written in Python. To make it usable in the compiler, the peephole optimizer, and any C-API bytecode processors, we need something accessible from C. We could, of course, rewrite most of dis
in C, move it to builtins, and give it a C API, but that’s a lot of work—and it would probably discourage further improvements to dis
, and it might even discourage other Python implementations from including it.
Proposal
As mentioned above, a Bytecode
is basically an iterable of tuples. And an iterable of tuples is something that you can already handle from the C API.
Meanwhile, what should jump arguments be? The simplest change is to make them references to other instructions. There are a couple of minor questions on that, which I’ll get to in the appropriate disassembly section.
And that’s really all we need to make the dis
format the standard, portable assembly format, and remove the need for most code (including bytecode-based optimizers, code-coverage tools, etc.) to need to depend on fragile and difficult-to-process raw bytecode.
PyCode_Assemble
PyCode_Assemble(instructions, name, filename, firstlineno)
This function (in Include/code.h
and compile.c
) takes any iterable of tuples of 2 or more elements each, where the first three elements are opcode
, argval
, and line
, and returns a code
object. While assembling, it removes any existing EXTENDED_ARG
and NOP
instructions (although it will of course insert new EXTENDED_ARG
as needed).
Note that a Bytecode
is an iterable of Instruction
s, which are namedtuple
s, so (assuming we reorganize the attributes of Instruction
, and change jump targets to be references to Instruction
s instead of offsets) you can call this with a Bytecode
object. But it’s usually simpler to just call it with a generator.
The implementation for this new function would be somewhat complex if we were writing it from scratch—but the assemble
function in compile.c
already does almost exactly what we want, and the small changes to it that we need are not at all complex.
Do we need a corresponding PyCode_Disassemble
? It would be pretty simple, but I can’t think of any code in C, either today or in the future, that will ever need to disassemble a code object. (Well, you might want to write a third-party optimizer in C, but in that case, you can import the dis
module from your code and use it.) And we already have a nice disassembler in pure Python. So… no C API for disassembly.
PyCode_Optimize
This function currently takes a bytecode
bytestring, and in-out list
s for consts
, names
, and lnotab
, and returns a new bytestring. The implementation is the peephole optimizer.
PEP 511 proposes to change it to take a code
object and a context
object (a collection of flags, etc.), and return a new code
object. The implementation calls the code_transformer()
method of every registered code transformer, which have the same interface, and then calls the peephole optimizer.
I propose to instead change it to take an iterable of tuples and return the same, and to have PEP 511 transformers objects also use the same interface, and to rewrite the peephole optimizer to do so as well.
compile.c
Currently: The compiler converts each AST node into a block of instruction objects, where the instructions are pretty similar to the format used by the dis
module (opcode and argument value), but with one major difference: each jump target is a pointer to a block, rather than an offset into the (not-yet-existing) bytecode. The compiler flattens this tree of blocks into a linked list, then calls assemble
. The assemble
function does a few passes over each instruction of each block and ends up with most of the attributes of a code
object. It then calls PyCode_Optimize
with four of those attributes, and then builds a code
object from the result.
Instead, the compiler will turn its linked list of arrays of instruction objects into a flat list of instruction tuples, converting each jump target from a block pointer to an instruction pointer along the way, and introducing the NOP
label and setlineno pseudo-ops described below (which is very easy—each block starts with a label, and each instruction that starts a block or has a different line from the last instruction is a setlineno). It then call PyCode_Optimize
with that list, and then call PyCode_Assemble
(which replaces the existing assemble
) with the result.
dis.Opcode
Currently, the dis
module has 101 integer constants, opname
and opmap
to map back and forth between integer opcodes and their names, and each Instruction
has both opcode
and opname
attributes.
While we’re improving dis
, we might as well make an IntEnum
type called Opcode
to wrap this up. We can even throw has_jrel
and so forth on as methods. But for backward compatibility, we’ll keep the two mappings, the extra attribute, and the various lists and sets of raw opcodes too.
dis.assemble
The new dis.assemble
function just exports PyCode_Assemble
to Python in the obvious way.
dis._get_instructions_bytes
The core of the dis
module is a function called _get_instructions_bytes
. This takes a bytecode string, plus a bunch of optional parameters for things like the consts and varnames lists, and generates Instruction
objects. (If the optional parameters are missing, it creates fake values—e.g., a LOAD_FAST 1
with no varnames
gets an argval="1"
. Which is not at all useful for processing, but can occasionally be useful for dumping to the REPR, e.g., when you’re trying to debug a bytecode processor.)
We need to make a few changes to this function.
First, we add a new parameter, raw
, both to this function and to all the functions that call it.
If raw
is false, all EXTENDED_ARG
and NOP
opcodes are skipped during disassembly. (If you’re planning to modify and reassemble, they’re at best useless, and can be confusing.)
As mentioned earlier, jump arguments need the target instruction in the argval
, not an offset. If raw
, this is the actual instruction; otherwise, a new label pseudo-instruction (NOP, labelcount++)
is generated for the jump target, and later inserted before the actual instruction.
Also, since each instruction now has an optional line
, and starts a line iff line
is present and different from the previous instruction, rather than having an optional start_line
, we’ll add line
to every instruction. Also, if raw
is false, we’ll emit pseudo-instructions (NOP, None, line)
for each new line. (This allows people to modify instructions in-place without having to worry about whether they’re replacing a line start.)
While we’re at it, it’s almost always simpler to not worry about which instructions start a new line while dealing with bytecode. So, _get_instructions_bytes
can generate an (NOP, None, line)
for each instruction in the lntoab
, and then users can leave the line numbers off any real instructions they generate without having to worry about breaking the lnotab
. But that too can get in the way for really short functions, so maybe it should also be optional. Or maybe just controlled by the same labels
flag.
dis.Instruction
The Instruction
class just needs to add a line
attribute (which can be None
), rearrange its attributes so (opcode, argval, line)
come first, and make all attributes after the second optional (with default values of None
).
In fact, most of the other existing attributes aren’t useful when assembling, and aren’t useful when processing bytecode for other purposes, unless you’re trying to replace the high-level stringifying functions. But there’s no good reason to get rid of them; as long as people don’t have to specify them when creating or modifying instructions, they don’t get in the way. We could mark them deprecated; I’m not sure whether that’s worth it.
dis.Bytecode
Bytecode.assemble(self)
just calls assemble
, passing itself as the instructions, and the name, filename, and line number (if any) that it stored (directly or indirectly) at construction.
The second public change is to the constructor. The constructor currently looks like this:
Bytecode(x, *, first_line=None, current_offset=None)
That x
can be a code
, function
, method
, generator
, or str
(the last is handled by calling compile(x)
). The constructor stores x
for later, and digs out the first line number and current offset for later unless overridden by the optional arguments, and also digs out the code
object both to store and to disassemble.
We’ll add a couple more optional parameters to override the values pulled from the first:
Bytecode(x, *, name=None, filename=None, first_line=None, current_offset=None)
(This means we also need to store the name and filename, as we do with the other two, instead of pulling them out of the appropriate object on demand.)
And we’ll add two more “overloads” for the x
parameter: It can be a Bytecode
, in which case it’s just a shallow copy, or any (non-str
, non-Bytecode
) iterable, in which case it just stores [Instruction(Opcode(op), *rest) for (op, *rest) in it]
.
The third public change is that Bytecode
becomes a MutableSequence
. I haven’t found any examples where I wanted to mutate a Bytecode
; writing a non-mutating generator seems to be always easier, even when I need two passes. However, everyone else who’s written an example of a code generator does it by iterating indexes and mutating something in-place, and I can’t see any good reason to forbid that, so let’s allow it. This means that internally, Bytecode
has to store the instructions in a list at construction time, instead of generating them lazily in __iter__
.
Examples
Constify
The following function takes an iterable of instruction tuples, and gives you back an iterable of instruction tuples where every LOAD_GLOBAL
is replaced by a LOAD_CONST
with the current value of that global:
def constify(instructions):
for opcode, argval, *rest in instructions:
if opcode == dis.LOAD_GLOBAL:
yield (dis.LOAD_CONST, eval(argval, globals())
else:
yield (opcode, argval)
Or, if you prefer doing it mutably:
def constify(instructions):
bc = dis.Bytecode(instructions)
for i, instr in enumerate(bc):
if instr.opcode == dis.LOAD_GLOBAL:
bc[i] = (dis.LOAD_CONST, eval(instr.argval, globals()))
return bc
Either way, you can use it in a decorator:
def constify_func(func):
code = func.__code__
instructions = constify(dis.raw_disassemble(code))
func.__code__ = dis.assemble(instructions,
code.co_name, code.co_filename, code.co_firstlineno)
return func
… or a PEP 511 processor:
def code_transformer(self, instructions, context):
if context.some_flag:
return constify(instructions)
else:
return instructions
Dejumpify
For something a little more complicated, the first example everyone gives me of why labels are insufficient: let’s flatten out double-jumps.
This is one of the rare examples that’s easier without labels:
def _dejumpify(code):
bc = dis.Bytecode(code, raw=True)
for op, arg, *rest in bc:
if op.hasjump:
if arg.opcode in (dis.JUMP_FORWARD, dis.JUMP_ABSOLUTE):
yield (op, arg.arg, *rest)
continue
yield (op, arg, *rest)
def dejumpify(func):
code = func.__code__
func.__code__ = dis.assemble(
_dejumpify(code),
code.co_name, code.co_filename, code.co_firstlineno)
return func
… but it’s still not at all hard (or inefficient) with labels:
def _dejumpify(code):
bc = dis.Bytecode(code)
indices = {instr: i for i, instr in enumerate(bc)}
for op, arg, *rest in bc:
if op.hasjump:
target = bc[indices[arg]+1]
if target.opcode in (dis.JUMP_FORWARD, dis.JUMP_ABSOLUTE):
yield (op, target.arg, *rest)
continue
yield (op, arg, *rest)
Reorder blocks
For something a little more complicated, what if we wanted to break the code into blocks, reorder those blocks in some way that minimizes jumps, then relinearlize. Surely we’d need blocks then, right? Sure. And here’s how we get them:
def blockify(instructions):
block = []
for instruction in instructions:
if instruction.is_jump_target:
yield block
block = []
block.append(instruction)
yield block
(That could be shorter with itertools
, but some people seem to find such code confusing, so I wrote it out.)
So:
def minimize_jumps(instructions):
blocks = list(blockify(instructions))
# put them in a graph and optimize it or whatever...
# put them back in a flat iterable of blocks
return itertools.chain.from_iterable(blocks)
Backward compatibility and third-party libs
First, the fact that Instruction.argval
is now an instruction rather than an integer offset for jump instructions is a breaking change. I don’t think any code actually processes argval
; if you need offsets, you pretty much need the raw arg
. But I could be wrong. If I am, maybe it’s better to add a new argument
or argument_value
attribute, and leave argval
with its old meaning (adding it to the deprecated list, if we’re deprecating the old attributes).
If this is added to Python 3.6, we almost certainly want a backport to 3.4-5. I don’t know if it’s too important for that backport to export the C API for PyCode_Assemble
, but that’s not hard to add, so why not. Backporting to 2.6+/3.2+ would be great, but… is there a backport of the 3.4 version of dis
? If so, adding to that shouldn’t be too hard; if not, someone has to write that part first, and I’m not volunteering. :)
Code that only inspects bytecode, without modifying it, like coverage.py
, shouldn’t need any change. If that code is already using dis
in 3.4+, it can continue to do so—and most such code would then work even if Python 3.7 changed to wordcode. If that code isn’t using dis
, well, nothing changes.
Code that significantly modifies bytecode often relies on the third-party module byteplay
. We definitely want to allow such code to continue working. Ideally, byteplay
will change to just use dis
for assembly in 3.6+ (the way it changed to use dis
for various other bits in 3.4+), meaning it’ll no longer be a stumbling block to upgrading to new versions of Python. This would be a very easy change to byteplay
.
The only other utility module I’ve found for dealing with bytecode that isn’t stuck in Python 2.4 or something is codetransformer
. I haven’t looked at it in as much detail as byteplay
, and I don’t know whether it could/should be changed to use the dis
assembler in 3.6+.
Victor Stinner is in the middle of writing a new module called bytecode
, as an attempt at working out what would be the best API for rewriting the peephole assembler line-by-line in Python. I’m willing to bet that whatever API he comes up with could be trivially built as a wrapper on top of dis
, if dis
isn’t sufficient in itself. (I’m not sure why you’d want to rewrite the peephole assembler line-by-line instead of doing the work at a higher level, but presumably the hope is that whatever he comes up with would also be useful for other code.)
View comments