BMI Students

Thursday, May 10, 2007


Rate journals with eigenfactor:link
Blog post about it: link

Wednesday, April 18, 2007

The economist style guide


Tuesday, April 17, 2007

Color tools

Color brewer: link
Adobe Kuler: link
ColorJack:Speher link

Friday, January 05, 2007

Giant photographs of Boston

Very cool use of google maps


PDF printing in Mac OS X

This was surprising to me, but very useful.

Occasionally I download a PDF and I am not allowed to save it, or even print preview. That's irritating, especially when we have a subscription to that journal -- what exactly did we pay for?

If you go to print, then Preview does give you the option to save as PS. In my experience, this does not work -- I hate PS anyway.

There is a sketchy alternative that worked for me today. Choose print, then Fax PDF, and then choose preview, then save. All this, plus you get the rush of an illicit act, forbidden by a society that could never understand me.

Sunday, December 24, 2006

Network effects

A demonstration of a phase transition in a network -- cool java applet.

Monday, December 04, 2006

evil brain fungus

A David Attenborough clip.

Saturday, December 02, 2006

Postdoc considerations

An article by Phil Bourne on how to choose a postdoc.

Tuesday, September 19, 2006


Apparently a company called the Health Discovery Corporation (HDC) owns patents on applying SVMs to biological data and has sued companies that have violated that patent.

HDC holds a large patent portfolio protecting the use of support vector machines in bioinformatics applications.

Searching the patent database, it looks like the relevant one may be
#7062384, but I don't know if in fact that's the one mentioned in the article.

HDC's scientific advisory board includes Vladimir Vapnik, one of the main inventors of the technology, so maybe they do have a moral (in addition to legal) claim on the technology, but it still just feels wrong.

Thursday, September 14, 2006

Nature peer review trial

Nature has started testing its collaborative peer review system

Monday, September 04, 2006

3d cell animation

this company (XVIVO)
worked with harvard researchers to make this video, which is awesome.

Friday, August 25, 2006

Mixing Python and C

There are a bunch of ways to mix Python and C, including Pyrex (nice technology, but non-standard, so hard to distribute), Boost (never tried it, looks ok), and SWIG (good, but requires some heavy lifting; for large-scale projects), and PyCXX, which Zach mentioned on this blog before (never tried it).

First, before resorting to C, try the excellent psyco module, which gets you a free speedup and requires no work (and if you like the cut of its jib, google for PyPy). The only catch is that psyco is i386-specific.

My preferred way to use C with Python is by actually writing the boilerplate C myself. This sounds stupid/hairy but once you have the minimal code in place, it becomes quite easy to extend. This is especially true if you are doing what I imagine to be the typical Python/C mix: calling a C function from Python with an array to operate on, and getting an array or a number in return (e.g. replacing a slow matrix-operation loop). Smith-Waterman would be a good example; write it in Python, then replace the Smith-Waterman function with C, and verify it is correct by comparing to the Python output, which I assume is correct, but slow, (for instance, it might use easy-to-human-parse strings). I am also assuming that you are using Numeric/numpy arrays and not Python lists, which is likely/advisable for these kinds of number-crunching tasks.

In that spirit, and to save others time I have wasted, below is a very small example C program, a python program that calls it, and a "" file to build the C shared object that python imports.

First, the C code. This code is very simple. It takes the Python Numeric/numpy array as an argument, and its length (you can also null terminate the array). C requires two files to be imported, Python.h, which should be in your path, and arrayobject.h, a Numeric file that may not be in your path (you can copy it into the directory for testing).

Note how the C array is just the data part of the Numeric array cast as int* ( c_segs_array = (int *)segs_array->data; ). At the end of the function a "PyArrayObject" is built from this C array, and returned using "PyBuildValue". The ease of translation between C arrays and Numeric arrays is key, and simplifies the whole process.

Note that c_segs_array must be cast as "char*" for the "PyArray_FromDimsAndData" function.

The second and third functions are boilerplate, and won't change much. No doubt some of this C file is mysterious, but most of it will not change at all. Any function that takes as input a Numeric array or number and returns an array or number can just be slotted into the mintest function.

#include "Python.h"
#include "Numeric/arrayobject.h"

static PyObject *
mintest(PyObject *self, PyObject *args, PyObject *kwargs) {

//List arguments/keywords
static char *kwlist[] = {"py_segs","num_segs",NULL};

int i;

int num_segs;
int dims[1];

PyObject *py_segs;
PyArrayObject *segs_array;
int *c_segs_array;

//Parse the input
if (!PyArg_ParseTupleAndKeywords(args, kwargs, "Oi:nothing", kwlist,
&py_segs, &num_segs)) {
return NULL;

//Make C arrays from my python numeric arrays

segs_array = (PyArrayObject *)PyArray_ContiguousFromObject(py_segs, PyArray_INT, 0, num_segs);
c_segs_array = (int *)segs_array->data;

for (i = 0; i < num_segs; i++) {
fprintf(stderr,"C testing %d\n",c_segs_array[i]);

//Return the array
dims[0] = num_segs;
PyArrayObject *return_array = (PyArrayObject *)PyArray_FromDimsAndData(1,dims,PyArray_INT, (char*)c_segs_array);
return Py_BuildValue("Oi", return_array, num_segs);

static PyMethodDef mintestMethods[] = {
{"mintest", (PyCFunction)mintest, METH_VARARGS|METH_KEYWORDS,
"HELP for minimal_test\n"},
{NULL,NULL,0,NULL} /* Sentinel -- don't change*/

initmintest(void) {
(void) Py_InitModule("mintest", mintestMethods);

Now This is simply a distutils file that tells python how to build the C file. Like with any python module, you type "python build" to build it, and "python install" to install. For testing, I usually just build it (which makes a build directory), then make a symbolic link in the main directory (ln -s build/lib.linux/

from distutils.core import setup,Extension

module1 = Extension('mintest',sources=['mintest.c'])

setup(name = 'mintest',
version = '1.0',
description = 'minimum C test',
ext_modules = [module1])

#extra_compile_args = ["-O4"] # You could put "-O4" etc. here.

Finally, the Python program, which is hopefully self-explanatory.

import os, sys, re
import random
import Numeric as N
import mintest

#Make a 1D array of length 10
pyarray_length = 10
pyarray = N.array([random.randrange(100) for i in range(pyarray_length)])

#Print out the array as Python sees it
print "Python printing array", type(pyarray), pyarray, pyarray_length

#Get the same array after passing it to C and back
carray, carray_length = mintest.mintest(pyarray, pyarray_length)

#Finally print out the returned array
print "Array after going through C", type(carray), carray, carray_length

And that's it! Pretty easy once you know how.

Tuesday, August 22, 2006


Following on from Zach's junk charts post, this infosthetics blog is sweet.

Saturday, August 05, 2006

ANSI Escape Codes in Python

ANSI escape codes are surprisingly useful. For Python, the escape code is "\x1b[". Here is an example loading bar. This is a Unix thing, won't work on windows.

sys.stderr.write("\x1b[34mloading[" + " "*10 + "]\x1b[0m\r")
for i in range(10):

Here "\x1b[34" is "colour foreground red", and "\x1b[8C" means move the cursor right 8 spaces"

It prints out something like this, but with loading in red:

Tuesday, June 13, 2006

Lectures on Interpreting High-Dimensional Data

Eugene Fratkin in Seraphim's lab sent me this link: Mathematical and Statistical Methods for Visualization and Analysis of High Dimensional Data. Each lecture has a link to download the video in various formats.

Nature's "open" peer review trial

Nature is testing the waters of a paradigm-shifting peer review process, where submissions can be displayed for the general public to see and comment on before acceptance. Three cheers for forward thinking where I didn't expect it!

Upon further investigation of PLoS One, I think it's in a similar vein. I've always been excited about this idea but am (pleasantly) surprised that it's actually happening so soon.

Thursday, June 08, 2006


I haven't read the whole thing yet, but PLoS ONE looks like it's going to be very interesting....


PCA and Friends: A useful review

Here is a potentially-useful review of PCA and other "geometric methods for feature extraction and dimensional reduction" (that's the paper title). It's by Chris Burges, who has done some nenat machine-learning work at Microsoft Research. The selection of algorithms reviewed is a bit limited by what Burges thinks is cool/useful and what he has used in the past, but hey, that's probably an OK criterion.

[Update: eliminated claim Burges invented SMO when really it was John Platt.]

Wednesday, June 07, 2006


A scary article on pharma and clinical trials.

Money quote:
As Dr. Marcia Angell, a former editor of The New England Journal of Medicine, noted in the Baltimore Sun, "What would be considered a grotesque conflict of interest if a politician or judge did it is somehow not in a physician."


Saturday, June 03, 2006

Humans and chimps

Jimmy: I have a crazy friend who says humans and chimps are related. Is he crazy?
Troy: No, just ignorant. You see, your crazy friend never heard of "The Bible." Just ask this scientician.

Tuesday, May 30, 2006


Slate has an interesting article on the use of bacteriophages to attack infections instead of antibiotics. They also speculate that unfortunately it will be hard to bring the technology to the US.


Saturday, May 20, 2006

Notes on classifiers

I have been testing a bunch of classifiers for a project I am doing. The objective is to classify an intergenic region as ACE1 (or any motif) or not-ACE1, based on features of the intergenic regions. I did this for a number of sets of features, and the results were very consistent. I have done enough tests that I feel comfortable relating some general conclusions....

Random Forests and SVMs always won, with random forests usually commanding a slight lead. SVMs with a polynomial kernel did a bit worse. MaxEnt usually came fourth, and seemed to do better on discrete data (hence the NLP slant of this method). Finally, k-nearest neighbors always lost. Random Forests were slower than SVMs, apart from that I think they are preferable.

Random forests are just collections of voting decision trees, each trained on bootstrapped data and variables. Someone must have done the same thing for collections of voting SVMs. If I find it I'll add it to this post. Seems like it must win overall.

Monday, May 15, 2006

Flowers that detect landmines

This kind of thing helps GM food's image no end...

Saturday, May 13, 2006


I use LaTeX a lot, mainly because Word on the mac is so horrible, and I like not worrying about formatting while typing. This article explains some of the small benefits of doing so.


Tuesday, May 02, 2006

Mammalian promoters

Nature Genetics just published a milestone paper from the Fantom/RIKEN consortium compiling an enormous genome-wide collection of transcript start sites (TSSs) in humans and mice. The paper could be a treasure trove for bioinformaticians. They collected TSS tags from many different tissues and mapped them onto the genome. There are several different classes of promoters: some with very well defined TSSs, some with very broad distributions (transcription can start anywhere in a comparatively broad region), some with mutliple well-defined sites and some with combinations of the above. The paper claims four classes. I don't know what kind of clustering they used -- but it would be interesting to know more about how distinct their classes are and if four is really the best estimate.

I think that in addition to analyses they did in the paper, one can try a bunch of correlations quickly -- several possibilities for projects small and large. Like, do promoter classes correlate with alternatively spliced genes? or are TSS'es correlated with transcription units from tiled array experiments (Affy and others)? One can also do some gene ontology correlations, or expression analysis using these data. We know that transcription initiation, splicing and expression (and other things) are all intimately connected, so this might be leveragable in many different directions...

Monday, May 01, 2006

Machine learning videos

There are a bunch of machine learning webcast lectures here. Many of them are tutorials; includes a few biology-focused lectures.


Tuesday, April 25, 2006

Good and bad information design blog

They have some interesting and potentially useful commentary on bad (and good) newspaper (&c.) infographics at Junk Charts.

Thursday, April 20, 2006 Useful python module for manipulating files.

Check out this path manipulation module. It looks quite handy, as it bundles a bunch of path-related things from the python standard library into a convenient class.
Here's a very simple example.

import os
path = '/foo/bar/baz'
files = [f for f in os.listdir(path) if f.endswith('.txt')]
fullpath = os.path.join(path, '')
f = open(fullpath, 'r')
lines = f.readlines()

from path import path
path = path('/foo/bar/baz')
files = path.files('*.txt')
fullpath = path / ''
lines = fullpath.lines()

Thursday, April 13, 2006

The Science of Scientific Writing

Some concrete examples on how to write better; concrete!


Wednesday, April 12, 2006

Academic search just launched. It's a citeseer/google scholar-like search by MSN.

Monday, April 10, 2006

On the installation of a proper Python environment

I recently had my hard drive fail, so I have just had the fun of reinstalling my Python environment from scratch. Here are instructions for setting up Python as a proper interactive development and data analysis environment. Some of these instructions are OS X specific (I'll flag those), but the general procedure will work on any *nix-ish platform.

Note that on OS X, I don't really love Fink or Darwinports for building and installing software for me. Especially not software that I depend on, and may need to patch, or use bleeding-edge versions, etc. So, here is how to install the following, all from source:
  • Python -- The best.
  • IPython -- An interactive python shell. Really useful.
  • NumPy and SciPy -- Numerical and scientific computing packages. Key for serious Python data analysis.
  • Gnuplot -- This is the plotting package that I use, and that I actually really like. (Hint: it saves plots as SVG for editing in Illustrator.)
  • -- A Python to Gnuplot bridge.

  • Here are the installation instructions.

    Before doing anything, make sure that /usr/local/bin is first in the PATH environment variable, because that's where we'll be installing these things. To see your path, type echo $PATH, and you will see a colon-separated list of directory names. This list or directories is searched, in the order specified, for programs to execute when you type in a particular name, like python. Since we are leaving Apple's own old (and crappy) version of Python in /usr/bin, we need to make sure that the new shiny Python we install (in /usr/local/bin) will be the one that is used when we type python into the shell. Hence the need to put /usr/local/bin first on the PATH.
    If you need to add /usr/local/bin to the PATH and you're using the bash shell (the OS X default), you will need to create a file called .profile in your home directory (if it doesn't exist) and add the following line to it: export PATH=/usr/local/bin:$PATH. Here's a good way to do that:
    echo "export PATH=/usr/local/bin:$PATH" >> .profile

    If you're using tcsh (you would know if you are since it's not the default; but you can type echo $SHELL to find out), you would want the following:
    echo "setenv PATH /usr/local/bin:$PATH" >> .tcshrc

    Also note that I've only tested this on OS X 10.4. Some of the stuff might not work right on 10.3. Finally, If you're using 10.4, make sure that you have the latest version of the developer tools.

    And now, the directions!
    1. Make a directory for the source code we'll be getting.
      cd ~
      mkdir Developer
      cd Developer

    2. Install GNU Readline. This library allows Python and other programs to use the arrow keys like you expect, and many other goodies. Most *nixes come with a good version of Readline, but Apple doesn't ship OS X with one, probably because it's GPL. We'll install the latest Readline, plus some patches to it that are pretty important to make IPython work.
      curl -O
      tar -xzf readline-5.1.tar.gz
      cd readline-5.1
      curl -O
      curl -O
      curl -O
      curl -O
      cat readline51* | patch
      sudo make install
      cd ..

    3. Now Python 2.4.3 (the latest released version). These instructions are specific for building Python as an OS X framework (the proper way to install Python on OS X.)
      mkdir Python
      cd Python
      curl -O
      tar -xzf Python-2.4.3.tgz
      cd Python-2.4.3
      ./configure --enable-framework
      sudo make frameworkinstall
      cd ../..

      If you're using tcsh, you'll need to type "rehash" so that the shell can find the just-installed python.

    4. Now we need to get Subversion (a CVS-like tool) to check out bleeding-edge versions of IPython, SciPy, and NumPy. (Trust me, the svn versions of these are better than the latest releases, and more bug-free, because I've been actively tracking down OS X bugs for these tools.)
      These instructions show how to (on OS X) download a .dmg disk image containing a .pkg installer, mount the image, install the package, and unmount the image, all from the command line. You could also just do it from the finder with double-clicking, but this shows how hard-core I am!
      curl -O
      hdiutil attach subversion-client-1.3.1.dmg
      sudo installer -pkg /Volumes/Subversion\ Client\ 1.3.1/SubversionClient-1.3.1.pkg -target /
      hdiutil detach /Volumes/Subversion\ Client\ 1.3.1

    5. Now IPython. The "pythonw" part is OS X-specific (see the IPython manual for explanation), on other platforms just use "python".
      cd Python
      svn co ipython
      cd ipython
      sudo pythonw install --install-scripts=/usr/local/bin
      cd ..

      Don't forget to type "rehash" if you're using tcsh, otherwise the shell won't be able to find the newly-installed ipython script.

    6. Now NumPy.
      svn co numpy
      cd numpy
      python build
      sudo python install
      cd ../..

    7. Here's a fun one. Apple ships GCC version 4 with Tiger. GCC 4 is OK, but it changed the standard for linking object files together from how GCC 3 did it. Now, we'll need to link together a lot of C and Fortran code for SciPy (which wraps lots of high-performance numerical libraries, which are mostly written in Fortran). So we'll need to use a single linking style -- that of gcc3 or of gcc4. Now g77 is the GNU fortran compiler that works with gcc3, and gfortran is the one for use with gcc4. Unfortunately, gfortran sort of sucks, in that it is known to generate incorrect code, especially for PPC chips. So, unless you've got an Intel Mac, we will have to use gcc3 and g77. (The gcc3 Apple supplies for Intel macs is known to suck, so on Intel you should use gcc4 and gfortran.)
      Anyhow, this means that we'll need to tell gcc to use version 3 and not version 4 for the code we compile to link with scipy. Skip if on an Intel Mac. Also skip if you're on OS X 10.3, because gcc3 is all you've got in that case.
      sudo gcc_select 3.3

    8. Now we install FFTW (version 2, which is what SciPy needs). FFTW is a library for doing Fourier transforms.
      curl -O
      tar -xzf fftw-2.1.5.tar.gz
      cd fftw-2.1.5
      sudo make install
      cd ..

    9. Now we get the Fortran compiler (this is OS X-specific). We'll just grab a pre-built binary of the compiler, since even I agree that compiling a compiler is overkill.
      For PPC Macs:
      curl -O
      sudo tar -C / -xzf g77v3.4-bin.tar.gz

      For Intel Macs:
      curl -O
      sudo tar -C / -xzf gfortran-intel-bin.tar.gz

    10. Now we compile SciPy.
      cd Python
      svn co scipy
      cd scipy
      python build
      sudo python install
      cd ../..

      If for some reason you have both g77 and gfortran installed, and want to choose which one to use, the build line looks like:
      python config_fc --fcompiler=XXX build

      where XXX is gnu (for g77) or gnu95 for gfortran.

    11. Now we can revert back to the default GCC version. (Skip on Intel Macs. Also skip on Macs running 10.3, since they don't have gcc4 anyway.)
      sudo gcc_select 4.0

    12. Now, install AquaTerm. This is a graphics terminal for Gnuplot to use that is very nice for OS X, and way better than using the X11 terminal, I promise. (It's anti-aliased, for example.)
      curl -O
      hdiutil attach AquaTerm1.0.0.dmg
      sudo installer -pkg /Volumes/AquaTerm/AquaTerm.pkg -target /
      hdiutil detach /Volumes/AquaTerm

    13. Now we grab a CVS version of Gnuplot (the CVS has better OS X support). The Sourceforge CVS servers are sometimes overloaded, so you might need to repeat these commands a few times until they succeed. (Thanks Brian for pointing this out.)
      cvs login
      [press enter to leave a blank password]
      cvs -z3 co -P gnuplot
      cd gnuplot
      sudo make install
      cd ..

    14. Last step! Install Now, this library is designed for Numeric, which was NumPy's predecessor. So to make things work, we'll use a NumPy tool to fix the code, and do some search-and-replace to fix things that the numpy tool doesn't fix currently. (Ugh!) Note that in an earlier version of these instructions, the python command line was python -c 'import numpy.lib.convertcode; numpy.lib.convertcode.convertall()'. This has changed with recent versions of numpy.
      cd Python
      curl -O
      tar -xzf gnuplot-py-1.7.tar.gz
      cd gnuplot-py-1.7
      python -c 'import numpy.oldnumeric.alter_code1; numpy.oldnumeric.alter_code1.convertall()'
      sed -i -e "s/Float32/float32/g" *.py
      sed -i -e "s/Float64/float64/g" *.py
      sed -i -e "s/Float/float_/g" *.py
      sudo python install
      cd ../..

    OK! Now how to use all of this stuff? Well... that's for a later post. Here's a hint, and a test to make sure everything works.
    Run ipython, and then try the following:
    import numpy, scipy, Gnuplot
    num_steps = 20
    range = numpy.linspace(0, 2 * numpy.pi, num_steps)
    sin = numpy.sin(range)
    print range
    print sin

    g = Gnuplot.Gnuplot()
    # Gnuplot expects list of [x, y] pairs, not [x-list, y-list]
    points = numpy.transpose([range, sin])
    # Feed text straight to Gnuplot to control plotting style.
    g('set data style linespoints')

    # Fit a spline to the original data and interpolate the curve with finer spacing
    import scipy.interpolate
    spline = scipy.interpolate.InterpolatedUnivariateSpline(x = range, y = sin)
    more_steps = 200
    new_range = numpy.linspace(0, 2 * numpy.pi, more_steps)
    interpolated = spline(new_range)
    g.plot(numpy.transpose([new_range, interpolated]))

    true_values = numpy.sin(new_range)
    error = interpolated - true_values
    g.plot(numpy.transpose([new_range, error]))

    print "RMS Error = ", numpy.sqrt((error**2).mean())

    Tuesday, April 04, 2006

    Folding DNA to create nanoscale shapes and patterns

    oooh this paper is cool...

    Mac shortcuts

    A couple of shortcuts I learned recently.

    Ctrl-command-D while hovering over a word, will give you a dictionary definition.
    Shift-command-4 gets a screenshot of a selection.
    Shift-command-3 gets a screenshot of the whole screen.
    Alt-command-= zooms in.
    Ctrl-alt-command-8 funketizes.

    Monday, March 27, 2006

    DSA keys

    RSA/DSA keys are great because typing in you password every time is so tedious, especially if you do a lot of scping. I do it rarely enough that I always forget how to set it up and have to scour the interweb for it.

    On the computer you are sshing from:
    ssh-keygen -t dsa
    I don't use a passphrase. I don't think it really matters.
    scp .ssh/ brian@other_computer:./.ssh/authorized_keys

    If authorized_keys already exists you'll want to append to that file....

    Idea market

    The NYTimes has a good article on a "stock market" for ideas and how companies are starting to do this. I think Google already does this.

    An interesting company they mention in InnoCentive...a company that posts chemistry and biology problems from companies with a reward for the group that solves it.

    Tuesday, March 14, 2006

    Javascript tutorial

    This is actually a pretty good reference. I've been writing stuff in Javascript since last summer and have read two books on it, but there is still a bunch of useful stuff in the tutorial i didn't know about. Which is weird because Javascript is a pretty small language.

    Friday, March 10, 2006

    Tom Rando's awesome talk

    Tom Rando gave last Wednesday's Frontiers talk and it was really good. He works on muscle growth and regeneration and the title of the talk was "Aging, stem cells, and the challenge of senescent tissue repair". Aging and stem cells -- double sexy based on the title alone, but the content was even better.

    I am not going to summarize the presentation in detail, but will list three of the coolest things I learned during it. First, apparently, one can do parabiotic experiments, which involves connecting two organisms subcutaneously. After a while (days or weeks), the vessels of the two organisms find each other and they start merging their circulatory systems. In Rando's experiments they attached mice from different age groups and studied the effects of young blood on old mice and vice versa. What they found (this is the second cool thing) is that stem cells that are responsible for muscle regeneration after injury are present and are totally fine in the old mice. It's just that the younger mice have some kind of a factor in their blood serum that stimulates the stem cell activation, while the old mice appear to be saddled with inhibitors. Joining a young mouse with an old one restored muscle regeneration in the old mouse completely. They also did some in vitro experiments to further understand what's going on. Pretty neat.

    The last thing that kind of blew my mind, was this idea first proposed by Cairns in 1979, that through successive rounds of DNA replication the organism remembers which strands are the original ones and which ones are copies. These template strands are segregated together and find themselves in the same cells. This allows the organism to preserve the original code and withstand the mutational load in tissues with a lot of regeneration (most errors come about as a result of synthesis). It seems like initially no one could find any support for this hypothesis, but now Rando and others have presented some pretty convincing evidence based on DNA labeling experiments. I guess you needed to know where too look -- stem cells are the ones that carry the template DNA and there aren't that many of them relatively speaking.

    While I don't know if there is anything informatics-related in what Tom Rando does but he is at Stanford, and is doing some of the coolest work around.

    Friday, February 17, 2006


    """If you put every virus particle on Earth together in a row, they would form a line 10 million light-years long. """

    That seems like rather a lot of viruses.

    discover magazine article


    A blog post about the Nature Biotech article that criticized bio-ontologies, featuring our very own Mark Musen.

    Via postgenomic.


    Thursday, February 16, 2006

    New metablog thing

    I don't know what it is exactly, but I like it.

    Postgenomic aggregates posts from life science blogs and then does useful and interesting things with that data.


    Saturday, February 11, 2006

    Uniquifying a list in Python

    I keep needing to do this all the time (given a list, remove all the duplicates). So a one-liner:

    Thursday, February 02, 2006

    Advice for getting published

    10 simple rules from Phil Bourne over at PLOS Comp Bio. They are mostly common sense, but make for some useful reading anyway.

    Wednesday, February 01, 2006

    Trends in Machine Learning

    Some graphs showing trends in the use of SVMs/naive Bayes/expert systems etc.

    Similar for bioinformatics.

    Tuesday, January 31, 2006


    Nice post by Serge.

    For the pythonistas out there

    A very nice reference for Python 2.4. Good to keep handy.

    Monday, January 30, 2006

    More expertise

    Following up on expertise, it seems like throwing darts at a stock ticker is as good as using mutual funds. I wonder if someone has worked out the expected return from using a random fund compared to random stock choices...

    Viruses/Germs vs genes

    A virus makes you fat?
    A virus makes you autistic?
    A germ makes you gay?
    A germ makes you schizophrenic? (Remind me not to go near cats).
    Maybe, maybe not, but viruses/germs are ubiquitous and genes are not the be all and end all.

    Also I would like to take this opportunity to coin the word "germes" for foreign DNA/parasites that have effects that seem indistinguishable from genetic. For good measure, I also coin "germome", and "germetics", and "germealogy" (yes, that's how you spell it).

    Start using these words immediately.

    Wednesday, January 18, 2006

    The best place to work...

    is Genentech, apparently.

    Monday, January 16, 2006

    Brains are Bayesian

    Economist article
    Nature paper


    NASA uses humans to find craters on Mars. I wonder how this would work for tumors, or other cell classification tasks....

    Saturday, December 31, 2005

    13 things that do not make sense

    An article in New Scientist. The first thing that makes no sense is the placebo effect, which may or may not be real.

    Wednesday, December 28, 2005


    An extension to weka for bioinformatics exists. Their added feature list is impressive.


    Tuesday, December 27, 2005


    A fascinating history of wheat in the Economist. Link. Also: Golden rice

    Tuesday, December 13, 2005

    Many real alive people are writing articles about this gift ideas website. I endorse this product.

    Sunday, December 11, 2005


    An interesting New Yorker article on forecasting and ostensible expertise.

    There are also many studies showing that expertise and experience do not make someone a better reader of the evidence. In one, data from a test used to diagnose brain damage were given to a group of clinical psychologists and their secretaries. The psychologists’ diagnoses were no better than the secretaries’.


    Tuesday, December 06, 2005

    Spicy food and cancer

    This article has some interesting statistics about cancer and spicy/Indian food. At least something that's good for cancer tastes good... Good blog too.

    Sunday, December 04, 2005

    NY Times articles

    I recently found out that you can read an NYT article all on one page, by adding ?pagewanted=all to the end of the url. This bookmarklet also does the trick.


    Tuesday, November 29, 2005

    Using Curl for fast downloads

    This is nice, although I am rarely waiting on downloads these days.
    From here.

    For example, suppose you want to download the Mandrake 8.0 ISO from the following three locations:

    The length of the file is 677281792, so initiate three simultaneous downloads using curl's "--range" option:
    bash$ curl -r 0-199999999 -o mdk-iso.part1 $url1 &
    bash$ curl -r 200000000-399999999 -o mdk-iso.part2 $url2 &
    bash$ curl -r 400000000- -o mdk-iso.part3 $url3 &

    Monday, November 14, 2005

    Machine learning blog

    At this machine learning blog, they scanned in a number of old papers, including:

    "Why isn't everyone a Bayesian?" by Efron B, American Statistician 1986. Examines reasons why not everybody was a Bayesian, as of 1986, with scorching reply from Lindley.
    "Axioms of Maximum Entropy" by Skilling, MaxEnt 1988 proceedings. Sets up four practically motivated axioms, and uses them to derive maximum entropy as the unique method for picking a single probability distribution from the set of valid probability distributions.

    The other posts are worth a look too.

    Personal Genome Project

    George Church comments on a Personal Genome Project...

    Friday, November 04, 2005

    The FDR

    False Discovery Rate is really important for most of us. This paper (lecture notes, actually) covers most of what you need to know. Gil Chu recently gave a talk on local FDR, so I am linking to that paper too. The first link's server seems to be down right now (hopefully temporarily). Anyone know what the difference between local FDR and PER is?

    Multiple Hypothesis Correction
    Review from Genome Res.
    Local FDR

    Monday, October 24, 2005

    Nature podcast

    The Nature podcast is sweet.

    Nature podcast

    Nature now has a podcast, and it's pretty good.

    Tuesday, October 18, 2005


    Choosing typefaces can be stressful, and if you are not careful you could end up using Comic Sans, or Arial (this is Zach-style font snobbery, the most enjoyable kind). That's why I have made a list of fonts I can refer to. For those unaware, serif is better for long strings of text, particularly printed text; it's less tiring to follow.

    Best serif fonts (from Garamond (a common font), Caslon (apparently best for books), Stone, Jaslon. I don't really like serif but sometimes there is no choice.

    Best for the web: Verdana, it's common enough (unlike, say, Century Schoolbook, which I also like), and it looks nicer than Arial/Helvetica.

    Best sans-serif: Apple knows a lot about nice fonts, so I figure I will copy them. They use Myriad for packaging/ads etc, and it's exceptionally nice. They also use Helvetica Neue sometimes. Finally, Lucida Grande is the "OS X font".

    Thursday, October 13, 2005

    Bioinformatics blog

    I stumbled upon this bioinformatics/genomics blog.

    The 2005 zeitgeist is particularly interesting, it has a graph of hot topics (including machine learning methods) over the past year...

    Monday, October 10, 2005

    Web Tools

    Two useful sets of tools for playing with websites...

    Friday, October 07, 2005

    God Created the Integer

    Check out this new book by Steven Hawkings:

    God Created the Integer

    Wednesday, October 05, 2005

    Einstein and Models

    Here's a neat snippet about Einstein from the KQED web site:

    In 1905, Albert Einstein was a pleasant but unimpressive young man in a patent office until he drafted his Theory of Relativity. Shortly thereafter he wrote a supplement saying, "The idea is amusing and enticing, but whether the Lord is laughing at it and has played a trick on me -- that I cannot know."

    Tuesday, October 04, 2005

    Nanotube RAM

    Check out this blurb from Nature news:

    It seems to have real potential for providing fast, nonvolatile RAM. The company claims it will have its first products out by next year...

    I'm In

    A quick test post to see if I'm really in... ;-D

    Saturday, October 01, 2005

    PhD laws

    If you are doing a PhD, you should be aware of Hofstadter's law, and Parkinson's law. Parkinson's law comes from a very interesting book by Parkinson about the civil service.

    Monday, September 26, 2005


    If it's good enough for Brad Efron (he of bootstrap fame), maybe I should be taking Selenium too...

    Dr. Brad Efron, a professor of statistics at Stanford, has a different dietary approach. He does not have prostate cancer, but he had a couple of scares and he has friends who have it. So he is taking selenium, a trace mineral found in plants.

    A study that randomly assigned people to take selenium or not to see whether it protected against skin cancer found that it had no effect on that cancer, but that the men taking it had only a third as many prostate cancers.

    from this NYT article on cancer and diet: Here