BMI Students

Thursday, May 10, 2007

Eigenfactor

Rate journals with eigenfactor:link
Blog post about it: link

Wednesday, April 18, 2007

The economist style guide

link

Tuesday, April 17, 2007

Color tools

Color brewer: link
Adobe Kuler: link
ColorJack:Speher link

Friday, January 05, 2007

Giant photographs of Boston

Very cool use of google maps

link

PDF printing in Mac OS X

This was surprising to me, but very useful.

Occasionally I download a PDF and I am not allowed to save it, or even print preview. That's irritating, especially when we have a subscription to that journal -- what exactly did we pay for?

If you go to print, then Preview does give you the option to save as PS. In my experience, this does not work -- I hate PS anyway.

There is a sketchy alternative that worked for me today. Choose print, then Fax PDF, and then choose preview, then save. All this, plus you get the rush of an illicit act, forbidden by a society that could never understand me.

Sunday, December 24, 2006

Network effects

A demonstration of a phase transition in a network -- cool java applet.

http://steinbock.org/netlogo/random_buttons.html

Monday, December 04, 2006

evil brain fungus

A David Attenborough clip.
link

Saturday, December 02, 2006

Postdoc considerations

An article by Phil Bourne on how to choose a postdoc.
http://dx.doi.org/10.1371/journal.pcbi.0020121

Thursday, September 14, 2006

Nature peer review trial

Nature has started testing its collaborative peer review system
here

Monday, September 04, 2006

3d cell animation

this company (XVIVO)
worked with harvard researchers to make this video, which is awesome.

Friday, August 25, 2006

Mixing Python and C

There are a bunch of ways to mix Python and C, including Pyrex (nice technology, but non-standard, so hard to distribute), Boost (never tried it, looks ok), and SWIG (good, but requires some heavy lifting; for large-scale projects), and PyCXX, which Zach mentioned on this blog before (never tried it).

First, before resorting to C, try the excellent psyco module, which gets you a free speedup and requires no work (and if you like the cut of its jib, google for PyPy). The only catch is that psyco is i386-specific.

My preferred way to use C with Python is by actually writing the boilerplate C myself. This sounds stupid/hairy but once you have the minimal code in place, it becomes quite easy to extend. This is especially true if you are doing what I imagine to be the typical Python/C mix: calling a C function from Python with an array to operate on, and getting an array or a number in return (e.g. replacing a slow matrix-operation loop). Smith-Waterman would be a good example; write it in Python, then replace the Smith-Waterman function with C, and verify it is correct by comparing to the Python output, which I assume is correct, but slow, (for instance, it might use easy-to-human-parse strings). I am also assuming that you are using Numeric/numpy arrays and not Python lists, which is likely/advisable for these kinds of number-crunching tasks.

In that spirit, and to save others time I have wasted, below is a very small example C program, a python program that calls it, and a "setup.py" file to build the C shared object that python imports.
More


First, the C code. This code is very simple. It takes the Python Numeric/numpy array as an argument, and its length (you can also null terminate the array). C requires two files to be imported, Python.h, which should be in your path, and arrayobject.h, a Numeric file that may not be in your path (you can copy it into the directory for testing).

Note how the C array is just the data part of the Numeric array cast as int* ( c_segs_array = (int *)segs_array->data; ). At the end of the function a "PyArrayObject" is built from this C array, and returned using "PyBuildValue". The ease of translation between C arrays and Numeric arrays is key, and simplifies the whole process.

Note that c_segs_array must be cast as "char*" for the "PyArray_FromDimsAndData" function.

The second and third functions are boilerplate, and won't change much. No doubt some of this C file is mysterious, but most of it will not change at all. Any function that takes as input a Numeric array or number and returns an array or number can just be slotted into the mintest function.




#include "Python.h"
#include "Numeric/arrayobject.h"


static PyObject *
mintest(PyObject *self, PyObject *args, PyObject *kwargs) {

//-----------------------
//List arguments/keywords
//-----------------------
static char *kwlist[] = {"py_segs","num_segs",NULL};

int i;

int num_segs;
int dims[1];

PyObject *py_segs;
PyArrayObject *segs_array;
int *c_segs_array;

//---------------
//Parse the input
//---------------
if (!PyArg_ParseTupleAndKeywords(args, kwargs, "Oi:nothing", kwlist,
&py_segs, &num_segs)) {
return NULL;
}

//-------------------------------------------
//Make C arrays from my python numeric arrays
//-------------------------------------------

segs_array = (PyArrayObject *)PyArray_ContiguousFromObject(py_segs, PyArray_INT, 0, num_segs);
c_segs_array = (int *)segs_array->data;


for (i = 0; i < num_segs; i++) {
fprintf(stderr,"C testing %d\n",c_segs_array[i]);
}

//----------------
//Return the array
//----------------
dims[0] = num_segs;
PyArrayObject *return_array = (PyArrayObject *)PyArray_FromDimsAndData(1,dims,PyArray_INT, (char*)c_segs_array);
return Py_BuildValue("Oi", return_array, num_segs);
}


static PyMethodDef mintestMethods[] = {
{"mintest", (PyCFunction)mintest, METH_VARARGS|METH_KEYWORDS,
"HELP for minimal_test\n"},
{NULL,NULL,0,NULL} /* Sentinel -- don't change*/
};

PyMODINIT_FUNC
initmintest(void) {
(void) Py_InitModule("mintest", mintestMethods);
import_array();
}


Now setup.py. This is simply a distutils file that tells python how to build the C file. Like with any python module, you type "python setup.py build" to build it, and "python setup.py install" to install. For testing, I usually just build it (which makes a build directory), then make a symbolic link in the main directory (ln -s build/lib.linux/mintest.so mintest.so).



from distutils.core import setup,Extension

module1 = Extension('mintest',sources=['mintest.c'])

setup(name = 'mintest',
version = '1.0',
description = 'minimum C test',
ext_modules = [module1])


#extra_compile_args = ["-O4"] # You could put "-O4" etc. here.


Finally, the Python program, which is hopefully self-explanatory.


import os, sys, re
import random
import Numeric as N
import mintest

#Make a 1D array of length 10
pyarray_length = 10
pyarray = N.array([random.randrange(100) for i in range(pyarray_length)])

#Print out the array as Python sees it
print "Python printing array", type(pyarray), pyarray, pyarray_length

#Get the same array after passing it to C and back
carray, carray_length = mintest.mintest(pyarray, pyarray_length)

#Finally print out the returned array
print "Array after going through C", type(carray), carray, carray_length


And that's it! Pretty easy once you know how.

Tuesday, August 22, 2006

infosthetics

Following on from Zach's junk charts post, this infosthetics blog is sweet.
www.infosthetics.com

Saturday, August 05, 2006

ANSI Escape Codes in Python

ANSI escape codes are surprisingly useful. For Python, the escape code is "\x1b[". Here is an example loading bar. This is a Unix thing, won't work on windows.



sys.stderr.write("\x1b[34mloading[" + " "*10 + "]\x1b[0m\r")
sys.stderr.write("\x1b[8C")
for i in range(10):
sys.stderr.write('.')
sys.stderr.write('\n')

Here "\x1b[34" is "colour foreground red", and "\x1b[8C" means move the cursor right 8 spaces"

It prints out something like this, but with loading in red:
loading[..........]

http://en.wikipedia.org/wiki/ANSI_escape_code

Thursday, June 08, 2006

PLoS ONE

I haven't read the whole thing yet, but PLoS ONE looks like it's going to be very interesting....

link

Wednesday, June 07, 2006

Pharma...

A scary article on pharma and clinical trials.

Money quote:
As Dr. Marcia Angell, a former editor of The New England Journal of Medicine, noted in the Baltimore Sun, "What would be considered a grotesque conflict of interest if a politician or judge did it is somehow not in a physician."

link

Saturday, June 03, 2006

Humans and chimps

Jimmy: I have a crazy friend who says humans and chimps are related. Is he crazy?
Troy: No, just ignorant. You see, your crazy friend never heard of "The Bible." Just ask this scientician.
link

Tuesday, May 30, 2006

Phages

Slate has an interesting article on the use of bacteriophages to attack infections instead of antibiotics. They also speculate that unfortunately it will be hard to bring the technology to the US.

link

Saturday, May 20, 2006

Notes on classifiers

I have been testing a bunch of classifiers for a project I am doing. The objective is to classify an intergenic region as ACE1 (or any motif) or not-ACE1, based on features of the intergenic regions. I did this for a number of sets of features, and the results were very consistent. I have done enough tests that I feel comfortable relating some general conclusions....

Random Forests and SVMs always won, with random forests usually commanding a slight lead. SVMs with a polynomial kernel did a bit worse. MaxEnt usually came fourth, and seemed to do better on discrete data (hence the NLP slant of this method). Finally, k-nearest neighbors always lost. Random Forests were slower than SVMs, apart from that I think they are preferable.

Random forests are just collections of voting decision trees, each trained on bootstrapped data and variables. Someone must have done the same thing for collections of voting SVMs. If I find it I'll add it to this post. Seems like it must win overall.

Monday, May 15, 2006

Flowers that detect landmines

This kind of thing helps GM food's image no end...
link

Saturday, May 13, 2006

LaTeX

I use LaTeX a lot, mainly because Word on the mac is so horrible, and I like not worrying about formatting while typing. This article explains some of the small benefits of doing so.

link

Tuesday, May 02, 2006

Mammalian promoters

Nature Genetics just published a milestone paper from the Fantom/RIKEN consortium compiling an enormous genome-wide collection of transcript start sites (TSSs) in humans and mice. The paper could be a treasure trove for bioinformaticians. They collected TSS tags from many different tissues and mapped them onto the genome. There are several different classes of promoters: some with very well defined TSSs, some with very broad distributions (transcription can start anywhere in a comparatively broad region), some with mutliple well-defined sites and some with combinations of the above. The paper claims four classes. I don't know what kind of clustering they used -- but it would be interesting to know more about how distinct their classes are and if four is really the best estimate.

I think that in addition to analyses they did in the paper, one can try a bunch of correlations quickly -- several possibilities for projects small and large. Like, do promoter classes correlate with alternatively spliced genes? or are TSS'es correlated with transcription units from tiled array experiments (Affy and others)? One can also do some gene ontology correlations, or expression analysis using these data. We know that transcription initiation, splicing and expression (and other things) are all intimately connected, so this might be leveragable in many different directions...

Monday, May 01, 2006

Machine learning videos

There are a bunch of machine learning webcast lectures here. Many of them are tutorials; includes a few biology-focused lectures.

link

Thursday, April 13, 2006

The Science of Scientific Writing

Some concrete examples on how to write better; concrete!

link

Wednesday, April 12, 2006

Academic search

academic.live.com just launched. It's a citeseer/google scholar-like search by MSN.

Tuesday, April 04, 2006

Mac shortcuts

A couple of shortcuts I learned recently.

Ctrl-command-D while hovering over a word, will give you a dictionary definition.
Shift-command-4 gets a screenshot of a selection.
Shift-command-3 gets a screenshot of the whole screen.
Alt-command-= zooms in.
Ctrl-alt-command-8 funketizes.

Monday, March 27, 2006

DSA keys

RSA/DSA keys are great because typing in you password every time is so tedious, especially if you do a lot of scping. I do it rarely enough that I always forget how to set it up and have to scour the interweb for it.

On the computer you are sshing from:
ssh-keygen -t dsa
I don't use a passphrase. I don't think it really matters.
Then
scp .ssh/id_dsa.pub brian@other_computer:./.ssh/authorized_keys

If authorized_keys already exists you'll want to append to that file....

Tuesday, March 14, 2006

Javascript tutorial

This is actually a pretty good reference. I've been writing stuff in Javascript since last summer and have read two books on it, but there is still a bunch of useful stuff in the tutorial i didn't know about. Which is weird because Javascript is a pretty small language.
link

Friday, March 10, 2006

Tom Rando's awesome talk

Tom Rando gave last Wednesday's Frontiers talk and it was really good. He works on muscle growth and regeneration and the title of the talk was "Aging, stem cells, and the challenge of senescent tissue repair". Aging and stem cells -- double sexy based on the title alone, but the content was even better.

I am not going to summarize the presentation in detail, but will list three of the coolest things I learned during it. First, apparently, one can do parabiotic experiments, which involves connecting two organisms subcutaneously. After a while (days or weeks), the vessels of the two organisms find each other and they start merging their circulatory systems. In Rando's experiments they attached mice from different age groups and studied the effects of young blood on old mice and vice versa. What they found (this is the second cool thing) is that stem cells that are responsible for muscle regeneration after injury are present and are totally fine in the old mice. It's just that the younger mice have some kind of a factor in their blood serum that stimulates the stem cell activation, while the old mice appear to be saddled with inhibitors. Joining a young mouse with an old one restored muscle regeneration in the old mouse completely. They also did some in vitro experiments to further understand what's going on. Pretty neat.

The last thing that kind of blew my mind, was this idea first proposed by Cairns in 1979, that through successive rounds of DNA replication the organism remembers which strands are the original ones and which ones are copies. These template strands are segregated together and find themselves in the same cells. This allows the organism to preserve the original code and withstand the mutational load in tissues with a lot of regeneration (most errors come about as a result of synthesis). It seems like initially no one could find any support for this hypothesis, but now Rando and others have presented some pretty convincing evidence based on DNA labeling experiments. I guess you needed to know where too look -- stem cells are the ones that carry the template DNA and there aren't that many of them relatively speaking.

While I don't know if there is anything informatics-related in what Tom Rando does but he is at Stanford, and is doing some of the coolest work around.

Friday, February 17, 2006

Viruses

"""If you put every virus particle on Earth together in a row, they would form a line 10 million light-years long. """

That seems like rather a lot of viruses.

discover magazine article

Bio-ontologies

A blog post about the Nature Biotech article that criticized bio-ontologies, featuring our very own Mark Musen.

Via postgenomic.

nodalpoint

Thursday, February 16, 2006

New metablog thing

I don't know what it is exactly, but I like it.

Postgenomic aggregates posts from life science blogs and then does useful and interesting things with that data.

postgenomic

Saturday, February 11, 2006

Uniquifying a list in Python

I keep needing to do this all the time (given a list, remove all the duplicates). So a one-liner:
dict().fromkeys(yourlist).keys()

Thursday, February 02, 2006

Advice for getting published

10 simple rules from Phil Bourne over at PLOS Comp Bio. They are mostly common sense, but make for some useful reading anyway.

Wednesday, February 01, 2006

Trends in Machine Learning

Some graphs showing trends in the use of SVMs/naive Bayes/expert systems etc.
Link

Similar for bioinformatics.
Link

Tuesday, January 31, 2006

Bagelblog

Nice post by Serge.
Link

For the pythonistas out there

A very nice reference for Python 2.4. Good to keep handy.
link

Monday, January 30, 2006

More expertise

Following up on expertise, it seems like throwing darts at a stock ticker is as good as using mutual funds. I wonder if someone has worked out the expected return from using a random fund compared to random stock choices...
Link

Viruses/Germs vs genes

A virus makes you fat?
A virus makes you autistic?
A germ makes you gay?
A germ makes you schizophrenic? (Remind me not to go near cats).
Maybe, maybe not, but viruses/germs are ubiquitous and genes are not the be all and end all.

Also I would like to take this opportunity to coin the word "germes" for foreign DNA/parasites that have effects that seem indistinguishable from genetic. For good measure, I also coin "germome", and "germetics", and "germealogy" (yes, that's how you spell it).

Start using these words immediately.

Wednesday, January 18, 2006

The best place to work...

is Genentech, apparently.
link

Monday, January 16, 2006

Brains are Bayesian

Economist article
Nature paper

Clickworkers

NASA uses humans to find craters on Mars. I wonder how this would work for tumors, or other cell classification tasks....
Clickworkers

Saturday, December 31, 2005

13 things that do not make sense

An article in New Scientist. The first thing that makes no sense is the placebo effect, which may or may not be real.

Wednesday, December 28, 2005

Bioweka

An extension to weka for bioinformatics exists. Their added feature list is impressive.

Bioweka

Tuesday, December 27, 2005

Wheat

A fascinating history of wheat in the Economist. Link. Also: Golden rice

Tuesday, December 13, 2005

giftornot.com

Many real alive people are writing articles about this gift ideas website. I endorse this product.

Sunday, December 11, 2005

Forecasting

An interesting New Yorker article on forecasting and ostensible expertise.


There are also many studies showing that expertise and experience do not make someone a better reader of the evidence. In one, data from a test used to diagnose brain damage were given to a group of clinical psychologists and their secretaries. The psychologists’ diagnoses were no better than the secretaries’.


link

Tuesday, December 06, 2005

Spicy food and cancer

This article has some interesting statistics about cancer and spicy/Indian food. At least something that's good for cancer tastes good... Good blog too.

Sunday, December 04, 2005

NY Times articles

I recently found out that you can read an NYT article all on one page, by adding ?pagewanted=all to the end of the url. This bookmarklet also does the trick.

javascript:if%20(location.href.indexOf('?')>0){location.href+="&pagewanted=all"}else{location.href+="?pagewanted=all"};

Tuesday, November 29, 2005

Using Curl for fast downloads

This is nice, although I am rarely waiting on downloads these days.
From here.

For example, suppose you want to download the Mandrake 8.0 ISO from the following three locations:
url1=http://ftp.eecs.umich.edu/pub/linux/mandrake/iso/Mandrake80-inst.iso
url2=http://ftp.rpmfind.net/linux/Mandrake/iso/Mandrake80-inst.iso
url3=http://ftp.wayne.edu/linux/mandrake/iso/Mandrake80-inst.iso

The length of the file is 677281792, so initiate three simultaneous downloads using curl's "--range" option:
bash$ curl -r 0-199999999 -o mdk-iso.part1 $url1 &
bash$ curl -r 200000000-399999999 -o mdk-iso.part2 $url2 &
bash$ curl -r 400000000- -o mdk-iso.part3 $url3 &

Monday, November 14, 2005

Machine learning blog

At this machine learning blog, they scanned in a number of old papers, including:

"Why isn't everyone a Bayesian?" by Efron B, American Statistician 1986. Examines reasons why not everybody was a Bayesian, as of 1986, with scorching reply from Lindley.
"Axioms of Maximum Entropy" by Skilling, MaxEnt 1988 proceedings. Sets up four practically motivated axioms, and uses them to derive maximum entropy as the unique method for picking a single probability distribution from the set of valid probability distributions.

The other posts are worth a look too.
Here

Personal Genome Project

George Church comments on a Personal Genome Project...
Here

Friday, November 04, 2005

The FDR

False Discovery Rate is really important for most of us. This paper (lecture notes, actually) covers most of what you need to know. Gil Chu recently gave a talk on local FDR, so I am linking to that paper too. The first link's server seems to be down right now (hopefully temporarily). Anyone know what the difference between local FDR and PER is?

Multiple Hypothesis Correction
Review from Genome Res.
Local FDR

Monday, October 24, 2005

Nature podcast

Nature now has a podcast, and it's pretty good.
http://www.nature.com/nature/podcast/index.html

Tuesday, October 18, 2005

Typefaces

Choosing typefaces can be stressful, and if you are not careful you could end up using Comic Sans, or Arial (this is Zach-style font snobbery, the most enjoyable kind). That's why I have made a list of fonts I can refer to. For those unaware, serif is better for long strings of text, particularly printed text; it's less tiring to follow.

Best serif fonts (from bamag.com): Garamond (a common font), Caslon (apparently best for books), Stone, Jaslon. I don't really like serif but sometimes there is no choice.

Best for the web: Verdana, it's common enough (unlike, say, Century Schoolbook, which I also like), and it looks nicer than Arial/Helvetica.

Best sans-serif: Apple knows a lot about nice fonts, so I figure I will copy them. They use Myriad for packaging/ads etc, and it's exceptionally nice. They also use Helvetica Neue sometimes. Finally, Lucida Grande is the "OS X font".

Thursday, October 13, 2005

Bioinformatics blog

I stumbled upon this bioinformatics/genomics blog.
http://www.ghastlyfop.com/blog/

The 2005 zeitgeist is particularly interesting, it has a graph of hot topics (including machine learning methods) over the past year...
http://www.ghastlyfop.com/blog/2005/10/bioinformatics-zeitgeist-05.html

Monday, October 10, 2005

Web Tools

Two useful sets of tools for playing with websites...

www.dnsstuff.com

www.faganfinder.com/urlinfo

Saturday, October 01, 2005

PhD laws

If you are doing a PhD, you should be aware of Hofstadter's law, and Parkinson's law. Parkinson's law comes from a very interesting book by Parkinson about the civil service.

Monday, September 26, 2005

Selenium

If it's good enough for Brad Efron (he of bootstrap fame), maybe I should be taking Selenium too...

"""
Dr. Brad Efron, a professor of statistics at Stanford, has a different dietary approach. He does not have prostate cancer, but he had a couple of scares and he has friends who have it. So he is taking selenium, a trace mineral found in plants.

A study that randomly assigned people to take selenium or not to see whether it protected against skin cancer found that it had no effect on that cancer, but that the men taking it had only a third as many prostate cancers.
"""

from this NYT article on cancer and diet: Here

Friday, September 23, 2005

Top ten most cited works from 1976-1983 (via Metafilter)

Nice. I have read all of these and find their ideas intriguing if simple-minded.

1. T.S. Kuhn, The Structure of Scientific Revolutions. 1962
2. J. Joyce, Ulysses. 1922
3. N. Frye, Anatomy of Criticism: Four Essays. 1957
4. L. Wittgenstein, Philosophical Investigations
5. N. Chomsky, Aspects of the Theory of Syntax. 1965
6. M. Foucault, The Order of Things. 1966
7. J. Derrida, Of Grammatology
8. R. Barthes, S/Z. 1970
9. M. Heidegger, Being and Time. 1927
10. E.R. Curtius, European Literature and the Latin Middle Ages. 1948

Tuesday, September 20, 2005

Another Stanford MacArthur genius

Pehr Harbury of the Biochemistry Department. Last year, Daphne Koller and Julie Theriot were among the winners.

link

Tuesday, September 13, 2005

Humanities sucks.

Nice piece in the guardian about science communication. He'll really
make friends at the Guardian with lines like:

"humanities graduates in the media, who suspect themselves to be
intellectuals, desperately need to reinforce the idea that science is
nonsense: because they've denied themselves access to the most
significant developments in the history of western thought for 200
years, and secretly, deep down, they're angry with themselves over
that."

Here

Monday, September 12, 2005

Practice Blackjack

As I was exploring the potential of Javascript/DHTML in informatics-related applications, I stumbled onto another area where that technology can be put to good use -- gambling! So to practice my Javascript skills and to help beginner to intermediate Blackjack players everywhere, I wrote a small application and put it up at blackjack-bst.com (which stands for Blackjack Basic Strategy Trainer dot com). Check it out and tell me what you think. It is still a work in progress.

Monday, August 29, 2005

PLOS Clinical Trials

Yup, another PLOS journal! Coming later this year, sounds cool.

Tuesday, August 16, 2005

Sean Eddy is moving to Janelia Farm

I have not heard of that place until today but it sounds pretty amazing. [link]

Wednesday, June 29, 2005

Synthetic biology news from boing boing

http://www.boingboing.net/2005/06/29/craig_venters_new_co.html

Thursday, June 09, 2005

tinyproxy

Tinyproxy is a cool little program that allows you to use your linux/mac box as a proxy with little overhead. This is useful because when you are away, say in Europe, you can access all the Stanford-accessible journals by going through your Stanford box. Like a lot of these little tools, this was introduced to me by Mike.
Tinyproxy

Freecell

Javascript / DHTML are important technologies in bioinformatics. I swear that is why I did this.
Just Freecell