Alternatively, it might be nice if there were a way to do "inline bytecode assembly" in CPython, similar to the way you do inline assembly in many C compilers, so the answer to random's question is just "asm [('BUILD_SET', 0)]" or something similar.You can follow the rest of the thread from there if you want. I don't really think it's worth pursuing (certainly not for this use case), but it's an interesting idea.
Here's what it would take.
Bytecode assembler
First, you need something that can assemble that inline assembly into actual bytecode. This is the easy part, but I couldn't find any existing projects that worked with Python 3.x, so I wrote one, called cpyasm. There are things that could be improved (see the TODO section in the README), but the basics are there.If you only want to write entire functions in assembly, cpyasm already gives you everything you need. But that's not what we want, we want to be able to write most of a function in Python, dropping to assembly for a line or two in the middle.
Compiler support
The compiler works in three stages: It tokenizes the data, parses the tokens into an AST, and compiles the AST into bytecode.
The parsing is done according to a GRAMMAR file. Changing CPython's Grammar explains how to edit this file, and then all the other things you have to edit or rebuild after that. The end result is going to be a new kind of AST node, Asm, with appropriate properties.
Fortunately, we're not adding any new bytecodes to the interpreter; the whole point of inline assembly is to provide a different way to generate the existing bytecodes. But compile.c needs to be edited to transform the AST node into a sequence of bytecodes. This is where the actual "assembler" part of the inline assembler goes.
The simplest way to add an asm statement would be to make it a simple statement with one argument, which can be parsed by the existing machinery because it's just a string:
simple_stmt ::= expression_stmt
| ...
| asm_stmt
asm_stmt ::= "asm" stringliteral
However, that would mean the parser can't detect any errors in the inline assembly, or even recognize names. On the opposite end, you could make it a compound statement whose suite consists of lines of assembly code, which have their own grammar:
compound_stmt ::= if_stmt
| ...
| inline_asm_stmt
inline_asm_stmt ::= 'asm' ':' asm_suite
asm_suite ::= asm_statement NEWLINE | NEWLINE INDENT asm_statement+ DEDENT
asm_statement ::= asm_jrel_stmt | asm_jabs_stmt | asm_local_stmt | ...
asm_jrel_stmt ::= ( 'FOR_ITER' | 'JUMP_FORWARD' | ... ) asm_jrel_target
asm_jrel_target ::= asm_label | '+' integer | '-' integer | integer
… and so on
That's a whole lot more work, but it also means the parser does a lot of your work for you (all the cruft I did with regexps in cpyasm).
In between the two, you could use something like a simple statement that takes a list display instead of a string, which would make the parser parse out the list values, which would allow it to recognize identifiers and notice them as names in use, etc.
Import hook
Adding support to the compiler might be fun to do when I have a bit of free time, but it'll be a lot of work.Also, it's unlikely to get accepted upstream. The best way to get a radically new feature accepted upstream is to package it as a module, put it on PyPI, and see how people are using it, and then go back to the core team and show them how useful it is. But you can't package a compiler modification up as a module. So, what can you do?
Instead of modifying the compiler, you can use an import hook. The core of the normal import process for source files is read the data as bytes, decode the bytes, parse the str into an AST, compile the AST into bytecode. We can write our own version of the function that does those steps and insert our own code to, regexp the source code, or transform the tree. And we can get that installed to handle all source files, or to handle source files with some new extension like .pya, or whatever we want.
The importlib docs show all the details on how to do this, although there's really no good overview there, or anywhere else. You can see an example in a module called emptyset, which I think does a good job with the source and AST modification code, but probably a terrible job with the actually hooking part.
But notice that, while we can do whatever we want to the source before parsing it, and do whatever we want to the tree before compiling it, we can't do much to influence how the parsing itself happens. So, our new syntax has to be valid Python syntax, or at least something we can transform into valid Python syntax. And neither "asm:" followed by a suite full of statements like "CALL_FUNCTION 1,2" nor just "asm" followed by a string are valid syntax.
Fortunately, there's a simple trick that will work here easily: "asm(foo, bar, baz)" is a function call; "CALL_FUNCTION(1,2)" is a function call; etc. So, we either make that our syntax (which kind of sucks; if you wanted your blocks to end with ")" and each like to end in ",", you wouldn't be using Python, you'd be using one of those languages that ends blocks with "}" and lines with ";"…), or we transform our source to convert our chosen syntax into the parseable syntax—which I think is doable with regexps (since there are no recursive asm statements, etc.) even for the complex-statement version, and almost certainly for the simple-statement versions.
So now, we've got a parse tree that has a Call node where the func is a Name node whose id is 'asm', and whose args list is either a list of Call nodes, or a single Str node.
The way to handle this is to write an ast.NodeTransformer that has visit_Call and, if appropriate, visit_Str methods. The example in emptyset is pretty self-explanatory, and the docs in the ast module do a great job on this.
But… what do we transform these nodes into? We could try to work out the appropriate AST node for each bytecode, but this could get pretty tricky. So I think what we'd want to do instead is to assemble the bytecode here, then put some kind of marker into the node that will compile to some known, easily-findable bytecode of the right length—maybe just a sequence of NOPs?—and stash the assembled bytecode. Then, after we've gotten the bytecode back, we'll need to munge it. First, we may need to fixup some references in our code—e.g., the thing we thought was going to be local #1 is actually local #2, we need to change our LOAD_FAST 1 to a LOAD_FAST 2. The easy way to do this is to just re-assemble the source, now that we have all the info we need (the starting offset, the co_varnames, etc.). Then we just replace the sequence of NOPs with the assembled code, update the co_names etc. with any new names we've added, fix up the co_lnotab, and build a new code object out of the result. This isn't completely trivial, but the cpyasm module shows how all of it works.
MacroPy
There's one last option to consider here: Haoyi Lin's amazing MacroPy library helps you build macros that intercept and transform ASTs without having to do all the boilerplate stuff yourself. And it has support for all kinds of cool things that are a pain to build, like making it possible to generate .pyc files that are pre-transformed so you can distribute your project without requiring users to have MacroPy (or the inline asm module).I don't think MacroPy has hooks for source and bytecode transformations, and, it has only provisional Python 3.x support, but contributing to MacroPy might be better than duplicating all of the work already done. (And of course this would also mean you get support for the earlier versions of import hooks that are needed for each of 2.7, 3.2, and 3.3, instead of just 3.4…)
Add a comment