BMI Students

Friday, August 25, 2006

Mixing Python and C

There are a bunch of ways to mix Python and C, including Pyrex (nice technology, but non-standard, so hard to distribute), Boost (never tried it, looks ok), and SWIG (good, but requires some heavy lifting; for large-scale projects), and PyCXX, which Zach mentioned on this blog before (never tried it).

First, before resorting to C, try the excellent psyco module, which gets you a free speedup and requires no work (and if you like the cut of its jib, google for PyPy). The only catch is that psyco is i386-specific.

My preferred way to use C with Python is by actually writing the boilerplate C myself. This sounds stupid/hairy but once you have the minimal code in place, it becomes quite easy to extend. This is especially true if you are doing what I imagine to be the typical Python/C mix: calling a C function from Python with an array to operate on, and getting an array or a number in return (e.g. replacing a slow matrix-operation loop). Smith-Waterman would be a good example; write it in Python, then replace the Smith-Waterman function with C, and verify it is correct by comparing to the Python output, which I assume is correct, but slow, (for instance, it might use easy-to-human-parse strings). I am also assuming that you are using Numeric/numpy arrays and not Python lists, which is likely/advisable for these kinds of number-crunching tasks.

In that spirit, and to save others time I have wasted, below is a very small example C program, a python program that calls it, and a "setup.py" file to build the C shared object that python imports.
More


First, the C code. This code is very simple. It takes the Python Numeric/numpy array as an argument, and its length (you can also null terminate the array). C requires two files to be imported, Python.h, which should be in your path, and arrayobject.h, a Numeric file that may not be in your path (you can copy it into the directory for testing).

Note how the C array is just the data part of the Numeric array cast as int* ( c_segs_array = (int *)segs_array->data; ). At the end of the function a "PyArrayObject" is built from this C array, and returned using "PyBuildValue". The ease of translation between C arrays and Numeric arrays is key, and simplifies the whole process.

Note that c_segs_array must be cast as "char*" for the "PyArray_FromDimsAndData" function.

The second and third functions are boilerplate, and won't change much. No doubt some of this C file is mysterious, but most of it will not change at all. Any function that takes as input a Numeric array or number and returns an array or number can just be slotted into the mintest function.




#include "Python.h"
#include "Numeric/arrayobject.h"


static PyObject *
mintest(PyObject *self, PyObject *args, PyObject *kwargs) {

//-----------------------
//List arguments/keywords
//-----------------------
static char *kwlist[] = {"py_segs","num_segs",NULL};

int i;

int num_segs;
int dims[1];

PyObject *py_segs;
PyArrayObject *segs_array;
int *c_segs_array;

//---------------
//Parse the input
//---------------
if (!PyArg_ParseTupleAndKeywords(args, kwargs, "Oi:nothing", kwlist,
&py_segs, &num_segs)) {
return NULL;
}

//-------------------------------------------
//Make C arrays from my python numeric arrays
//-------------------------------------------

segs_array = (PyArrayObject *)PyArray_ContiguousFromObject(py_segs, PyArray_INT, 0, num_segs);
c_segs_array = (int *)segs_array->data;


for (i = 0; i < num_segs; i++) {
fprintf(stderr,"C testing %d\n",c_segs_array[i]);
}

//----------------
//Return the array
//----------------
dims[0] = num_segs;
PyArrayObject *return_array = (PyArrayObject *)PyArray_FromDimsAndData(1,dims,PyArray_INT, (char*)c_segs_array);
return Py_BuildValue("Oi", return_array, num_segs);
}


static PyMethodDef mintestMethods[] = {
{"mintest", (PyCFunction)mintest, METH_VARARGS|METH_KEYWORDS,
"HELP for minimal_test\n"},
{NULL,NULL,0,NULL} /* Sentinel -- don't change*/
};

PyMODINIT_FUNC
initmintest(void) {
(void) Py_InitModule("mintest", mintestMethods);
import_array();
}


Now setup.py. This is simply a distutils file that tells python how to build the C file. Like with any python module, you type "python setup.py build" to build it, and "python setup.py install" to install. For testing, I usually just build it (which makes a build directory), then make a symbolic link in the main directory (ln -s build/lib.linux/mintest.so mintest.so).



from distutils.core import setup,Extension

module1 = Extension('mintest',sources=['mintest.c'])

setup(name = 'mintest',
version = '1.0',
description = 'minimum C test',
ext_modules = [module1])


#extra_compile_args = ["-O4"] # You could put "-O4" etc. here.


Finally, the Python program, which is hopefully self-explanatory.


import os, sys, re
import random
import Numeric as N
import mintest

#Make a 1D array of length 10
pyarray_length = 10
pyarray = N.array([random.randrange(100) for i in range(pyarray_length)])

#Print out the array as Python sees it
print "Python printing array", type(pyarray), pyarray, pyarray_length

#Get the same array after passing it to C and back
carray, carray_length = mintest.mintest(pyarray, pyarray_length)

#Finally print out the returned array
print "Array after going through C", type(carray), carray, carray_length


And that's it! Pretty easy once you know how.

1 Comments:

  • This was really, really helpful. Thanks very much.

    In case anyone else has the small problem I had:
    I'm working on Slackware Linux 12.1, and numPy doesnt include arrayobject.h on the standard Python module search path. I dont know why, maybe its just my computer.. Anyway, instead of
    #include "Numeric/arrayobject.h" (or whatever) just 'locate' the file and then put the full path in the #include.

    By Blogger KC, at 2:03 PM  

Post a Comment

<< Home