For many cases--one step in your process briefly uses a ton of memory, and you want that to be released to the system, or you want to parallelize part of the process to take advantage of multiple cores--you just use multiprocessing or concurrent.futures.
But sometimes that doesn't work. Maybe the reason you want to split your code is that you need Python 3.3 features for one part, but a 2.7-only module for another part. Or part of your program needs Java or .NET, but another part needs a C extension module. And so on.
Example
To make this concrete, let's take one example: You've written a cool GUI in Jython, but now you discover that you need to call a function out of a C library. The library is named mylibrary; it's at /usr/local/lib/libmylibrary.so.2, and it defines two functions:long unhex(const char *hexstr) { return strtol(hexstr, NULL, 16); } long parse_int(const char *intstr, int base) { return strtol(intstr, NULL, base); }
You could do this by using JNI to bridge C to Java and then Jython's Java API to bridge that through to your Jython, but let's say you already know how to use ctypes, and you want to use that.
If you were just using CPython or PyPy, you could call unhex like this:
import ctypes mylibrary = ctypes.CDLL('/usr/local/lib/libmylibrary.so.2') mylibrary.unhex.argtypes = [c_char_p] mylibrary.unhex.restype = c_long if __name__ == '__main__': import sys for arg in sys.argv[1:]: print(unhex(arg))
But in Jython, you can't do that, because there's no ctypes.
Fun with subprocess
In this simple case, all you want to do is run one function in a different Python interpreter. As long as the input is just a few strings, and the output is just a string, all you need is subprocess.check_output:import subprocess def unhex(hexstr): return subprocess.check_output(['python', 'mylibrary_wrapper.py', hexstr])
Obviously you can use 'python3' or 'jython' or '/usr/local/bin/pypy' or '/opt/local/custompython/bin/python' or r'C:\CustomPython\Python.exe' or whatever in place of 'python' there.
If you need to get back more than one string as output, as long as you can easily encode it into a string, that's pretty easy. For example, let's say you wanted to unhex multiple strings:
def unhex(*hexstrs): return subprocess.check_output(['python', 'mylibrary_wrapper.py', hexstrs]).splitlines()
You can also encode input this way. There are limitations on what you can pass in through command-line arguments, but you can always pass things through stdin. For example, change the above program to:
for line in sys.stdin: print(unhex(line))
And now you can pass it a whole mess of strings without worrying about the command-line argument limits:
def unhex(*hexstrs): with subprocess.Popen(['python', 'mylibrary_wrapper.py'], stdin=subprocess.PIPE, stdout=subprocess.PIPE) as p: return p.communicate('\n'.join(hexstrs)).splitlines()
But what if you need to call the function thousands of times, and not all at once? The cost of starting up and shutting down thousands of Python interpreters may be prohibitive.
In that case, the answer is some form of RPC. You kick off a background program that stays running in the background, listening on a socket (or pipe, or whatever). Then, whenever you need to call on it, you send it a message over that socket, and it replies.
Running a service
For really trivial cases, you can build a trivial protocol that runs directly over sockets. For really complicated cases, you may want to build a custom protocol around something like Twisted. But for everything in the middle, it may be simpler to just piggyback on a protocol that already exists and has ready-to-go implementations.For example, let's use JSON-RPC directly over sockets, through the bjsonrpc library.
First, we need to build the server. Take the wrapper script above, leave the ctypes stuff alone, and replace the sys.argv or sys.stdin stuff with:
import bjsonrpc from bjsonrpc.handlers import BaseHandler class MyLibraryHandler(BaseHandler): def unhex(self, hexstr): return mylibrary.unhex(hexstr) s = bjsonrpc.createserver(port=12345, handler_factory=MyLibraryHandler) s.serve()
Now, in your Jython code, you can do this:
import subprocess import bjsonrpc class MyLibraryClient(object): def __init__(self): self.proc = subprocess.Popen(['python', 'mylibrary_wrapper.py']) self.conn = bjsonrpc.connect(port=12345) def close(self): self.conn.close() self.proc.kill() def unhex(self, hexstr): return self.conn.call.unhex(hexstr)
And that's it.
If you want to extend this to expose parse_int as well as unhex, you just need to wrap the ctypes function, add another method to the MyLibraryHandler and MyLibraryClient, and you can call it.
Automating the process
If you're wrapping up 78 functions in 5 different libraries that are under heavy development and keep changing, it will get very tedious (and error-prone and brittle) to add the same information in 3 places. You can make the ctypes stuff a lot easier by replacing it with a custom C extension module using, say, Cython, SWIG, SIP, or Boost.Python, or make it less brittle by using cffi. But what do you do about the server and client code?Well, first, notice that you don't really need the wrappers in the client. self.conn.call is already a dynamic wrapper around whatever the server happens to export.
And on the server side, you're just delegating calls from self to mylibrary. You can build those delegating methods up at start time, or use your favorite other technique for delegation.
If you want to get really crazy, you can write the interface in an IDL dialect and generate the C headers, C implementation stubs, ctypes/cffi/SIP/whatever wrappers, server wrappers, and client wrappers all out of the same source.
Of course you probably don't want to get really crazy, but the point is that you can. You've built an RPC server, and all of the powerful features of RPC and network servers are available if you need them.
Add a comment