Background

Currently, CPython’s internal bytecode format stores instructions with no args as 1 byte, instructions with small args as 3 bytes, and instructions with large args as 6 bytes (actually, a 3-byte EXTENDED_ARG followed by a 3-byte real instruction). While bytecode is implementation-specific, many other implementations (PyPy, MicroPython, …) use CPython’s bytecode format, or variations on it.

Python exposes as much of this as possible to user code. For example, you can write a decorator that takes a function’s __code__ member, access its co_code bytecode string and other attributes, build a different code object, and replace the function’s __code__. Or you can build an import hook that takes the top-level module code returned by compile (and all the code objects stored in co_consts for the top-level object or recursively in any of the nested objects), builds up a different one, and returns that as the module’s actual code.

And this isn’t just a nifty feature for playing around with Python’s internals without having to hack up the interpreter; there’s real-life production code that depends on doing this kind of stuff to code objects.

Making CPython bytecode format less brittle

There have been a few proposals to change the internal format: e.g., “wordcode” stores most instructions as 2 bytes (using EXTENDED_ARGS for 4 or 6 bytes when needed), while “packed ops” steals bits from the opcode to store very small args for certain ops (e.g., LOAD_CONST_2 can be stored as 1 byte instead of 3). Such proposals would obviously break all code that depends on bytecode. Guido suggested on python-dev that it would need two Python versions (so 3.7, if we finished wordcode today) before we could release such a breaking change; I’m not sure even that would be enough time.

Similarly, while some other implementations use essentially the same format as CPython, they’re not all actually compatible. For example, for compactness, MicroPython changes most non-jump instructions are 2 bytes instead of 3, which breaks almost any code that even inspects bytecode, much less tries to modify it.

One of the replies to Guido suggested, “Maybe we should have a standard portable assembly format for bytecode”. If we had that, it would remain consistent across most internal changes, even as radical as bytecode to wordcode, and would be portable between CPython and MicroPython. If we could get such a format into 3.6, I think it would definitely be reasonable for a radical change like wordcode to go into 3.7 as Guido suggests—and maybe even for any future changes to take place over a single version instead of two.

Making bytecode processors easier to write

Separately, as part of the discussion on PEP 511, which adds a new API for directly processing code objects, I suggested that it would be a lot simpler to write such processors, and also simpler for the compiler, if it used some “unpacked bytecode” format that’s easier to work with.

The main objection I heard to that idea was that any such format would make Python harder to change in the future—but once you consider that the obvious format to use is effectively a bytecode assembly language, it becomes clear that the opposite is true: we’re making Python easier to change in the future, along with making it simpler to work with.

The other objection is that there are just too many possible designs to choose from, so it’s way too early to put anything in the stdlib. But there really aren’t many possible designs to choose from.

dis

Python already has a standard bytecode assembly language format, as exposed by the dis module: a dis.Bytecode object represents the assembly-language version of a code object, as an iterable of dis.Instruction objects instead of just raw bytes (plus a few extra attributes and methods). It abstracts away things like variable instruction size (including EXTENDED_ARG), mapping indices into co_const into actual constant values, etc.

This is almost exactly what we want, for both purposes—but there are two problems with it.

First, there’s no assembler to get back from Bytecode back to code. Even if there were, Bytecode is immutable. And, even if it were mutable, Instruction objects have to be constructed out of a bunch of stuff that is either meaningless or needlessly difficult to deal with when you’re modifying or creating code. Most of that stuff is also useless to the assembler and could just be left at None, but the big one is the argval for each jump-like Instruction: it’s an offset into the raw byte string, not something meaningful at all in the Bytecode object. This means that if you add or remove any bytecodes, all the jumps have to be fixed up.

Second, the dis module is written in Python. To make it usable in the compiler, the peephole optimizer, and any C-API bytecode processors, we need something accessible from C. We could, of course, rewrite most of dis in C, move it to builtins, and give it a C API, but that’s a lot of work—and it would probably discourage further improvements to dis, and it might even discourage other Python implementations from including it.

Proposal

As mentioned above, a Bytecode is basically an iterable of tuples. And an iterable of tuples is something that you can already handle from the C API.

Meanwhile, what should jump arguments be? The simplest change is to make them references to other instructions. There are a couple of minor questions on that, which I’ll get to in the appropriate disassembly section.

And that’s really all we need to make the dis format the standard, portable assembly format, and remove the need for most code (including bytecode-based optimizers, code-coverage tools, etc.) to need to depend on fragile and difficult-to-process raw bytecode.

PyCode_Assemble

PyCode_Assemble(instructions, name, filename, firstlineno)

This function (in Include/code.h and compile.c) takes any iterable of tuples of 2 or more elements each, where the first three elements are opcode, argval, and line, and returns a code object. While assembling, it removes any existing EXTENDED_ARG and NOP instructions (although it will of course insert new EXTENDED_ARG as needed).

Note that a Bytecode is an iterable of Instructions, which are namedtuples, so (assuming we reorganize the attributes of Instruction, and change jump targets to be references to Instructions instead of offsets) you can call this with a Bytecode object. But it’s usually simpler to just call it with a generator.

The implementation for this new function would be somewhat complex if we were writing it from scratch—but the assemble function in compile.c already does almost exactly what we want, and the small changes to it that we need are not at all complex.

Do we need a corresponding PyCode_Disassemble? It would be pretty simple, but I can’t think of any code in C, either today or in the future, that will ever need to disassemble a code object. (Well, you might want to write a third-party optimizer in C, but in that case, you can import the dis module from your code and use it.) And we already have a nice disassembler in pure Python. So… no C API for disassembly.

PyCode_Optimize

This function currently takes a bytecode bytestring, and in-out lists for consts, names, and lnotab, and returns a new bytestring. The implementation is the peephole optimizer.

PEP 511 proposes to change it to take a code object and a context object (a collection of flags, etc.), and return a new code object. The implementation calls the code_transformer() method of every registered code transformer, which have the same interface, and then calls the peephole optimizer.

I propose to instead change it to take an iterable of tuples and return the same, and to have PEP 511 transformers objects also use the same interface, and to rewrite the peephole optimizer to do so as well.

compile.c

Currently: The compiler converts each AST node into a block of instruction objects, where the instructions are pretty similar to the format used by the dis module (opcode and argument value), but with one major difference: each jump target is a pointer to a block, rather than an offset into the (not-yet-existing) bytecode. The compiler flattens this tree of blocks into a linked list, then calls assemble. The assemble function does a few passes over each instruction of each block and ends up with most of the attributes of a code object. It then calls PyCode_Optimize with four of those attributes, and then builds a code object from the result.

Instead, the compiler will turn its linked list of arrays of instruction objects into a flat list of instruction tuples, converting each jump target from a block pointer to an instruction pointer along the way, and introducing the NOP label and setlineno pseudo-ops described below (which is very easy—each block starts with a label, and each instruction that starts a block or has a different line from the last instruction is a setlineno). It then call PyCode_Optimize with that list, and then call PyCode_Assemble (which replaces the existing assemble) with the result.

dis.Opcode

Currently, the dis module has 101 integer constants, opname and opmap to map back and forth between integer opcodes and their names, and each Instruction has both opcode and opname attributes.

While we’re improving dis, we might as well make an IntEnum type called Opcode to wrap this up. We can even throw has_jrel and so forth on as methods. But for backward compatibility, we’ll keep the two mappings, the extra attribute, and the various lists and sets of raw opcodes too.

dis.assemble

The new dis.assemble function just exports PyCode_Assemble to Python in the obvious way.

dis._get_instructions_bytes

The core of the dis module is a function called _get_instructions_bytes. This takes a bytecode string, plus a bunch of optional parameters for things like the consts and varnames lists, and generates Instruction objects. (If the optional parameters are missing, it creates fake values—e.g., a LOAD_FAST 1 with no varnames gets an argval="1". Which is not at all useful for processing, but can occasionally be useful for dumping to the REPR, e.g., when you’re trying to debug a bytecode processor.)

We need to make a few changes to this function.

First, we add a new parameter, raw, both to this function and to all the functions that call it.

If raw is false, all EXTENDED_ARG and NOP opcodes are skipped during disassembly. (If you’re planning to modify and reassemble, they’re at best useless, and can be confusing.)

As mentioned earlier, jump arguments need the target instruction in the argval, not an offset. If raw, this is the actual instruction; otherwise, a new label pseudo-instruction (NOP, labelcount++) is generated for the jump target, and later inserted before the actual instruction.

Also, since each instruction now has an optional line, and starts a line iff line is present and different from the previous instruction, rather than having an optional start_line, we’ll add line to every instruction. Also, if raw is false, we’ll emit pseudo-instructions (NOP, None, line) for each new line. (This allows people to modify instructions in-place without having to worry about whether they’re replacing a line start.)

While we’re at it, it’s almost always simpler to not worry about which instructions start a new line while dealing with bytecode. So, _get_instructions_bytes can generate an (NOP, None, line) for each instruction in the lntoab, and then users can leave the line numbers off any real instructions they generate without having to worry about breaking the lnotab. But that too can get in the way for really short functions, so maybe it should also be optional. Or maybe just controlled by the same labels flag.

dis.Instruction

The Instruction class just needs to add a line attribute (which can be None), rearrange its attributes so (opcode, argval, line) come first, and make all attributes after the second optional (with default values of None).

In fact, most of the other existing attributes aren’t useful when assembling, and aren’t useful when processing bytecode for other purposes, unless you’re trying to replace the high-level stringifying functions. But there’s no good reason to get rid of them; as long as people don’t have to specify them when creating or modifying instructions, they don’t get in the way. We could mark them deprecated; I’m not sure whether that’s worth it.

dis.Bytecode

Bytecode.assemble(self) just calls assemble, passing itself as the instructions, and the name, filename, and line number (if any) that it stored (directly or indirectly) at construction.

The second public change is to the constructor. The constructor currently looks like this:

Bytecode(x, *, first_line=None, current_offset=None)

That x can be a code, function, method, generator, or str (the last is handled by calling compile(x)). The constructor stores x for later, and digs out the first line number and current offset for later unless overridden by the optional arguments, and also digs out the code object both to store and to disassemble.

We’ll add a couple more optional parameters to override the values pulled from the first:

Bytecode(x, *, name=None, filename=None, first_line=None, current_offset=None)

(This means we also need to store the name and filename, as we do with the other two, instead of pulling them out of the appropriate object on demand.)

And we’ll add two more “overloads” for the x parameter: It can be a Bytecode, in which case it’s just a shallow copy, or any (non-str, non-Bytecode) iterable, in which case it just stores [Instruction(Opcode(op), *rest) for (op, *rest) in it].

The third public change is that Bytecode becomes a MutableSequence. I haven’t found any examples where I wanted to mutate a Bytecode; writing a non-mutating generator seems to be always easier, even when I need two passes. However, everyone else who’s written an example of a code generator does it by iterating indexes and mutating something in-place, and I can’t see any good reason to forbid that, so let’s allow it. This means that internally, Bytecode has to store the instructions in a list at construction time, instead of generating them lazily in __iter__.

Examples

Constify

The following function takes an iterable of instruction tuples, and gives you back an iterable of instruction tuples where every LOAD_GLOBAL is replaced by a LOAD_CONST with the current value of that global:

def constify(instructions):
    for opcode, argval, *rest in instructions:
        if opcode == dis.LOAD_GLOBAL:
            yield (dis.LOAD_CONST, eval(argval, globals())
        else:
            yield (opcode, argval)

Or, if you prefer doing it mutably:

def constify(instructions):
    bc = dis.Bytecode(instructions)
    for i, instr in enumerate(bc):
        if instr.opcode == dis.LOAD_GLOBAL:
            bc[i] = (dis.LOAD_CONST, eval(instr.argval, globals()))
    return bc

Either way, you can use it in a decorator:

def constify_func(func):
    code = func.__code__
    instructions = constify(dis.raw_disassemble(code))
    func.__code__ = dis.assemble(instructions,
                                 code.co_name, code.co_filename, code.co_firstlineno)
    return func

… or a PEP 511 processor:

def code_transformer(self, instructions, context):
    if context.some_flag:
        return constify(instructions)
    else:
        return instructions

Dejumpify

For something a little more complicated, the first example everyone gives me of why labels are insufficient: let’s flatten out double-jumps.

This is one of the rare examples that’s easier without labels:

def _dejumpify(code):
    bc = dis.Bytecode(code, raw=True)
    for op, arg, *rest in bc:
        if op.hasjump:
            if arg.opcode in (dis.JUMP_FORWARD, dis.JUMP_ABSOLUTE):
                yield (op, arg.arg, *rest)
                continue
        yield (op, arg, *rest)

def dejumpify(func):
    code = func.__code__
    func.__code__ = dis.assemble(
        _dejumpify(code),
        code.co_name, code.co_filename, code.co_firstlineno)
    return func

… but it’s still not at all hard (or inefficient) with labels:

def _dejumpify(code):
    bc = dis.Bytecode(code)
    indices = {instr: i for i, instr in enumerate(bc)}
    for op, arg, *rest in bc:
        if op.hasjump:
            target = bc[indices[arg]+1]
            if target.opcode in (dis.JUMP_FORWARD, dis.JUMP_ABSOLUTE):
                yield (op, target.arg, *rest)
                continue
        yield (op, arg, *rest)

Reorder blocks

For something a little more complicated, what if we wanted to break the code into blocks, reorder those blocks in some way that minimizes jumps, then relinearlize. Surely we’d need blocks then, right? Sure. And here’s how we get them:

def blockify(instructions):
    block = []
    for instruction in instructions:
        if instruction.is_jump_target:
            yield block
            block = []
        block.append(instruction)
    yield block

(That could be shorter with itertools, but some people seem to find such code confusing, so I wrote it out.)

So:

def minimize_jumps(instructions):
    blocks = list(blockify(instructions))
    # put them in a graph and optimize it or whatever...
    # put them back in a flat iterable of blocks
    return itertools.chain.from_iterable(blocks)

Backward compatibility and third-party libs

First, the fact that Instruction.argval is now an instruction rather than an integer offset for jump instructions is a breaking change. I don’t think any code actually processes argval; if you need offsets, you pretty much need the raw arg. But I could be wrong. If I am, maybe it’s better to add a new argument or argument_value attribute, and leave argval with its old meaning (adding it to the deprecated list, if we’re deprecating the old attributes).

If this is added to Python 3.6, we almost certainly want a backport to 3.4-5. I don’t know if it’s too important for that backport to export the C API for PyCode_Assemble, but that’s not hard to add, so why not. Backporting to 2.6+/3.2+ would be great, but… is there a backport of the 3.4 version of dis? If so, adding to that shouldn’t be too hard; if not, someone has to write that part first, and I’m not volunteering. :)

Code that only inspects bytecode, without modifying it, like coverage.py, shouldn’t need any change. If that code is already using dis in 3.4+, it can continue to do so—and most such code would then work even if Python 3.7 changed to wordcode. If that code isn’t using dis, well, nothing changes.

Code that significantly modifies bytecode often relies on the third-party module byteplay. We definitely want to allow such code to continue working. Ideally, byteplay will change to just use dis for assembly in 3.6+ (the way it changed to use dis for various other bits in 3.4+), meaning it’ll no longer be a stumbling block to upgrading to new versions of Python. This would be a very easy change to byteplay.

The only other utility module I’ve found for dealing with bytecode that isn’t stuck in Python 2.4 or something is codetransformer. I haven’t looked at it in as much detail as byteplay, and I don’t know whether it could/should be changed to use the dis assembler in 3.6+.

Victor Stinner is in the middle of writing a new module called bytecode, as an attempt at working out what would be the best API for rewriting the peephole assembler line-by-line in Python. I’m willing to bet that whatever API he comes up with could be trivially built as a wrapper on top of dis, if dis isn’t sufficient in itself. (I’m not sure why you’d want to rewrite the peephole assembler line-by-line instead of doing the work at a higher level, but presumably the hope is that whatever he comes up with would also be useful for other code.)

6

View comments

About Me
About Me
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.