Thursday, May 10, 2007
Wednesday, April 18, 2007
Tuesday, April 17, 2007
Friday, January 05, 2007
PDF printing in Mac OS X
This was surprising to me, but very useful.
Occasionally I download a PDF and I am not allowed to save it, or even print preview. That's irritating, especially when we have a subscription to that journal -- what exactly did we pay for?
If you go to print, then Preview does give you the option to save as PS. In my experience, this does not work -- I hate PS anyway.
There is a sketchy alternative that worked for me today. Choose print, then Fax PDF, and then choose preview, then save. All this, plus you get the rush of an illicit act, forbidden by a society that could never understand me.
Occasionally I download a PDF and I am not allowed to save it, or even print preview. That's irritating, especially when we have a subscription to that journal -- what exactly did we pay for?
If you go to print, then Preview does give you the option to save as PS. In my experience, this does not work -- I hate PS anyway.
There is a sketchy alternative that worked for me today. Choose print, then Fax PDF, and then choose preview, then save. All this, plus you get the rush of an illicit act, forbidden by a society that could never understand me.
Sunday, December 24, 2006
Network effects
A demonstration of a phase transition in a network -- cool java applet.
http://steinbock.org/netlogo/random_buttons.html
http://steinbock.org/netlogo/random_buttons.html
Monday, December 04, 2006
Saturday, December 02, 2006
Postdoc considerations
An article by Phil Bourne on how to choose a postdoc.
http://dx.doi.org/10.1371/journal.pcbi.0020121
http://dx.doi.org/10.1371/journal.pcbi.0020121
Thursday, September 14, 2006
Monday, September 04, 2006
Friday, August 25, 2006
Mixing Python and C
There are a bunch of ways to mix Python and C, including Pyrex (nice technology, but non-standard, so hard to distribute), Boost (never tried it, looks ok), and SWIG (good, but requires some heavy lifting; for large-scale projects), and PyCXX, which Zach mentioned on this blog before (never tried it).
First, before resorting to C, try the excellent psyco module, which gets you a free speedup and requires no work (and if you like the cut of its jib, google for PyPy). The only catch is that psyco is i386-specific.
My preferred way to use C with Python is by actually writing the boilerplate C myself. This sounds stupid/hairy but once you have the minimal code in place, it becomes quite easy to extend. This is especially true if you are doing what I imagine to be the typical Python/C mix: calling a C function from Python with an array to operate on, and getting an array or a number in return (e.g. replacing a slow matrix-operation loop). Smith-Waterman would be a good example; write it in Python, then replace the Smith-Waterman function with C, and verify it is correct by comparing to the Python output, which I assume is correct, but slow, (for instance, it might use easy-to-human-parse strings). I am also assuming that you are using Numeric/numpy arrays and not Python lists, which is likely/advisable for these kinds of number-crunching tasks.
In that spirit, and to save others time I have wasted, below is a very small example C program, a python program that calls it, and a "setup.py" file to build the C shared object that python imports.
More
First, the C code. This code is very simple. It takes the Python Numeric/numpy array as an argument, and its length (you can also null terminate the array). C requires two files to be imported, Python.h, which should be in your path, and arrayobject.h, a Numeric file that may not be in your path (you can copy it into the directory for testing).
Note how the C array is just the data part of the Numeric array cast as int* ( c_segs_array = (int *)segs_array->data; ). At the end of the function a "PyArrayObject" is built from this C array, and returned using "PyBuildValue". The ease of translation between C arrays and Numeric arrays is key, and simplifies the whole process.
Note that c_segs_array must be cast as "char*" for the "PyArray_FromDimsAndData" function.
The second and third functions are boilerplate, and won't change much. No doubt some of this C file is mysterious, but most of it will not change at all. Any function that takes as input a Numeric array or number and returns an array or number can just be slotted into the mintest function.
Now setup.py. This is simply a distutils file that tells python how to build the C file. Like with any python module, you type "python setup.py build" to build it, and "python setup.py install" to install. For testing, I usually just build it (which makes a build directory), then make a symbolic link in the main directory (ln -s build/lib.linux/mintest.so mintest.so).
Finally, the Python program, which is hopefully self-explanatory.
And that's it! Pretty easy once you know how.
First, before resorting to C, try the excellent psyco module, which gets you a free speedup and requires no work (and if you like the cut of its jib, google for PyPy). The only catch is that psyco is i386-specific.
My preferred way to use C with Python is by actually writing the boilerplate C myself. This sounds stupid/hairy but once you have the minimal code in place, it becomes quite easy to extend. This is especially true if you are doing what I imagine to be the typical Python/C mix: calling a C function from Python with an array to operate on, and getting an array or a number in return (e.g. replacing a slow matrix-operation loop). Smith-Waterman would be a good example; write it in Python, then replace the Smith-Waterman function with C, and verify it is correct by comparing to the Python output, which I assume is correct, but slow, (for instance, it might use easy-to-human-parse strings). I am also assuming that you are using Numeric/numpy arrays and not Python lists, which is likely/advisable for these kinds of number-crunching tasks.
In that spirit, and to save others time I have wasted, below is a very small example C program, a python program that calls it, and a "setup.py" file to build the C shared object that python imports.
More
First, the C code. This code is very simple. It takes the Python Numeric/numpy array as an argument, and its length (you can also null terminate the array). C requires two files to be imported, Python.h, which should be in your path, and arrayobject.h, a Numeric file that may not be in your path (you can copy it into the directory for testing).
Note how the C array is just the data part of the Numeric array cast as int* ( c_segs_array = (int *)segs_array->data; ). At the end of the function a "PyArrayObject" is built from this C array, and returned using "PyBuildValue". The ease of translation between C arrays and Numeric arrays is key, and simplifies the whole process.
Note that c_segs_array must be cast as "char*" for the "PyArray_FromDimsAndData" function.
The second and third functions are boilerplate, and won't change much. No doubt some of this C file is mysterious, but most of it will not change at all. Any function that takes as input a Numeric array or number and returns an array or number can just be slotted into the mintest function.
#include "Python.h"
#include "Numeric/arrayobject.h"
static PyObject *
mintest(PyObject *self, PyObject *args, PyObject *kwargs) {
//-----------------------
//List arguments/keywords
//-----------------------
static char *kwlist[] = {"py_segs","num_segs",NULL};
int i;
int num_segs;
int dims[1];
PyObject *py_segs;
PyArrayObject *segs_array;
int *c_segs_array;
//---------------
//Parse the input
//---------------
if (!PyArg_ParseTupleAndKeywords(args, kwargs, "Oi:nothing", kwlist,
&py_segs, &num_segs)) {
return NULL;
}
//-------------------------------------------
//Make C arrays from my python numeric arrays
//-------------------------------------------
segs_array = (PyArrayObject *)PyArray_ContiguousFromObject(py_segs, PyArray_INT, 0, num_segs);
c_segs_array = (int *)segs_array->data;
for (i = 0; i < num_segs; i++) {
fprintf(stderr,"C testing %d\n",c_segs_array[i]);
}
//----------------
//Return the array
//----------------
dims[0] = num_segs;
PyArrayObject *return_array = (PyArrayObject *)PyArray_FromDimsAndData(1,dims,PyArray_INT, (char*)c_segs_array);
return Py_BuildValue("Oi", return_array, num_segs);
}
static PyMethodDef mintestMethods[] = {
{"mintest", (PyCFunction)mintest, METH_VARARGS|METH_KEYWORDS,
"HELP for minimal_test\n"},
{NULL,NULL,0,NULL} /* Sentinel -- don't change*/
};
PyMODINIT_FUNC
initmintest(void) {
(void) Py_InitModule("mintest", mintestMethods);
import_array();
}
Now setup.py. This is simply a distutils file that tells python how to build the C file. Like with any python module, you type "python setup.py build" to build it, and "python setup.py install" to install. For testing, I usually just build it (which makes a build directory), then make a symbolic link in the main directory (ln -s build/lib.linux/mintest.so mintest.so).
from distutils.core import setup,Extension
module1 = Extension('mintest',sources=['mintest.c'])
setup(name = 'mintest',
version = '1.0',
description = 'minimum C test',
ext_modules = [module1])
#extra_compile_args = ["-O4"] # You could put "-O4" etc. here.
Finally, the Python program, which is hopefully self-explanatory.
import os, sys, re
import random
import Numeric as N
import mintest
#Make a 1D array of length 10
pyarray_length = 10
pyarray = N.array([random.randrange(100) for i in range(pyarray_length)])
#Print out the array as Python sees it
print "Python printing array", type(pyarray), pyarray, pyarray_length
#Get the same array after passing it to C and back
carray, carray_length = mintest.mintest(pyarray, pyarray_length)
#Finally print out the returned array
print "Array after going through C", type(carray), carray, carray_length
And that's it! Pretty easy once you know how.
Tuesday, August 22, 2006
infosthetics
Following on from Zach's junk charts post, this infosthetics blog is sweet.
www.infosthetics.com
www.infosthetics.com
Saturday, August 05, 2006
ANSI Escape Codes in Python
ANSI escape codes are surprisingly useful. For Python, the escape code is "\x1b[". Here is an example loading bar. This is a Unix thing, won't work on windows.
Here "\x1b[34" is "colour foreground red", and "\x1b[8C" means move the cursor right 8 spaces"
It prints out something like this, but with loading in red:
loading[..........]
http://en.wikipedia.org/wiki/ANSI_escape_code
sys.stderr.write("\x1b[34mloading[" + " "*10 + "]\x1b[0m\r")
sys.stderr.write("\x1b[8C")
for i in range(10):
sys.stderr.write('.')
sys.stderr.write('\n')
Here "\x1b[34" is "colour foreground red", and "\x1b[8C" means move the cursor right 8 spaces"
It prints out something like this, but with loading in red:
loading[..........]
http://en.wikipedia.org/wiki/ANSI_escape_code
Thursday, June 08, 2006
PLoS ONE
I haven't read the whole thing yet, but PLoS ONE looks like it's going to be very interesting....
link
link
Wednesday, June 07, 2006
Pharma...
A scary article on pharma and clinical trials.
Money quote:
As Dr. Marcia Angell, a former editor of The New England Journal of Medicine, noted in the Baltimore Sun, "What would be considered a grotesque conflict of interest if a politician or judge did it is somehow not in a physician."
link
Money quote:
As Dr. Marcia Angell, a former editor of The New England Journal of Medicine, noted in the Baltimore Sun, "What would be considered a grotesque conflict of interest if a politician or judge did it is somehow not in a physician."
link
Saturday, June 03, 2006
Humans and chimps
Jimmy: I have a crazy friend who says humans and chimps are related. Is he crazy?
Troy: No, just ignorant. You see, your crazy friend never heard of "The Bible." Just ask this scientician.
link
Troy: No, just ignorant. You see, your crazy friend never heard of "The Bible." Just ask this scientician.
link
Tuesday, May 30, 2006
Phages
Slate has an interesting article on the use of bacteriophages to attack infections instead of antibiotics. They also speculate that unfortunately it will be hard to bring the technology to the US.
link
link
Saturday, May 20, 2006
Notes on classifiers
I have been testing a bunch of classifiers for a project I am doing. The objective is to classify an intergenic region as ACE1 (or any motif) or not-ACE1, based on features of the intergenic regions. I did this for a number of sets of features, and the results were very consistent. I have done enough tests that I feel comfortable relating some general conclusions....
Random Forests and SVMs always won, with random forests usually commanding a slight lead. SVMs with a polynomial kernel did a bit worse. MaxEnt usually came fourth, and seemed to do better on discrete data (hence the NLP slant of this method). Finally, k-nearest neighbors always lost. Random Forests were slower than SVMs, apart from that I think they are preferable.
Random forests are just collections of voting decision trees, each trained on bootstrapped data and variables. Someone must have done the same thing for collections of voting SVMs. If I find it I'll add it to this post. Seems like it must win overall.
Random Forests and SVMs always won, with random forests usually commanding a slight lead. SVMs with a polynomial kernel did a bit worse. MaxEnt usually came fourth, and seemed to do better on discrete data (hence the NLP slant of this method). Finally, k-nearest neighbors always lost. Random Forests were slower than SVMs, apart from that I think they are preferable.
Random forests are just collections of voting decision trees, each trained on bootstrapped data and variables. Someone must have done the same thing for collections of voting SVMs. If I find it I'll add it to this post. Seems like it must win overall.
Monday, May 15, 2006
Saturday, May 13, 2006
LaTeX
I use LaTeX a lot, mainly because Word on the mac is so horrible, and I like not worrying about formatting while typing. This article explains some of the small benefits of doing so.
link
link
Tuesday, May 02, 2006
Mammalian promoters
Nature Genetics just published a milestone paper from the Fantom/RIKEN consortium compiling an enormous genome-wide collection of transcript start sites (TSSs) in humans and mice. The paper could be a treasure trove for bioinformaticians. They collected TSS tags from many different tissues and mapped them onto the genome. There are several different classes of promoters: some with very well defined TSSs, some with very broad distributions (transcription can start anywhere in a comparatively broad region), some with mutliple well-defined sites and some with combinations of the above. The paper claims four classes. I don't know what kind of clustering they used -- but it would be interesting to know more about how distinct their classes are and if four is really the best estimate.
I think that in addition to analyses they did in the paper, one can try a bunch of correlations quickly -- several possibilities for projects small and large. Like, do promoter classes correlate with alternatively spliced genes? or are TSS'es correlated with transcription units from tiled array experiments (Affy and others)? One can also do some gene ontology correlations, or expression analysis using these data. We know that transcription initiation, splicing and expression (and other things) are all intimately connected, so this might be leveragable in many different directions...
I think that in addition to analyses they did in the paper, one can try a bunch of correlations quickly -- several possibilities for projects small and large. Like, do promoter classes correlate with alternatively spliced genes? or are TSS'es correlated with transcription units from tiled array experiments (Affy and others)? One can also do some gene ontology correlations, or expression analysis using these data. We know that transcription initiation, splicing and expression (and other things) are all intimately connected, so this might be leveragable in many different directions...
Monday, May 01, 2006
Machine learning videos
There are a bunch of machine learning webcast lectures here. Many of them are tutorials; includes a few biology-focused lectures.
link
link
Thursday, April 13, 2006
Wednesday, April 12, 2006
Tuesday, April 04, 2006
Mac shortcuts
A couple of shortcuts I learned recently.
Ctrl-command-D while hovering over a word, will give you a dictionary definition.
Shift-command-4 gets a screenshot of a selection.
Shift-command-3 gets a screenshot of the whole screen.
Alt-command-= zooms in.
Ctrl-alt-command-8 funketizes.
Ctrl-command-D while hovering over a word, will give you a dictionary definition.
Shift-command-4 gets a screenshot of a selection.
Shift-command-3 gets a screenshot of the whole screen.
Alt-command-= zooms in.
Ctrl-alt-command-8 funketizes.
Monday, March 27, 2006
DSA keys
RSA/DSA keys are great because typing in you password every time is so tedious, especially if you do a lot of scping. I do it rarely enough that I always forget how to set it up and have to scour the interweb for it.
On the computer you are sshing from:
ssh-keygen -t dsa
I don't use a passphrase. I don't think it really matters.
Then
scp .ssh/id_dsa.pub brian@other_computer:./.ssh/authorized_keys
If authorized_keys already exists you'll want to append to that file....
On the computer you are sshing from:
ssh-keygen -t dsa
I don't use a passphrase. I don't think it really matters.
Then
scp .ssh/id_dsa.pub brian@other_computer:./.ssh/authorized_keys
If authorized_keys already exists you'll want to append to that file....
Tuesday, March 14, 2006
Javascript tutorial
This is actually a pretty good reference. I've been writing stuff in Javascript since last summer and have read two books on it, but there is still a bunch of useful stuff in the tutorial i didn't know about. Which is weird because Javascript is a pretty small language.
link
link
Friday, March 10, 2006
Tom Rando's awesome talk
Tom Rando gave last Wednesday's Frontiers talk and it was really good. He works on muscle growth and regeneration and the title of the talk was "Aging, stem cells, and the challenge of senescent tissue repair". Aging and stem cells -- double sexy based on the title alone, but the content was even better.
I am not going to summarize the presentation in detail, but will list three of the coolest things I learned during it. First, apparently, one can do parabiotic experiments, which involves connecting two organisms subcutaneously. After a while (days or weeks), the vessels of the two organisms find each other and they start merging their circulatory systems. In Rando's experiments they attached mice from different age groups and studied the effects of young blood on old mice and vice versa. What they found (this is the second cool thing) is that stem cells that are responsible for muscle regeneration after injury are present and are totally fine in the old mice. It's just that the younger mice have some kind of a factor in their blood serum that stimulates the stem cell activation, while the old mice appear to be saddled with inhibitors. Joining a young mouse with an old one restored muscle regeneration in the old mouse completely. They also did some in vitro experiments to further understand what's going on. Pretty neat.
The last thing that kind of blew my mind, was this idea first proposed by Cairns in 1979, that through successive rounds of DNA replication the organism remembers which strands are the original ones and which ones are copies. These template strands are segregated together and find themselves in the same cells. This allows the organism to preserve the original code and withstand the mutational load in tissues with a lot of regeneration (most errors come about as a result of synthesis). It seems like initially no one could find any support for this hypothesis, but now Rando and others have presented some pretty convincing evidence based on DNA labeling experiments. I guess you needed to know where too look -- stem cells are the ones that carry the template DNA and there aren't that many of them relatively speaking.
While I don't know if there is anything informatics-related in what Tom Rando does but he is at Stanford, and is doing some of the coolest work around.
I am not going to summarize the presentation in detail, but will list three of the coolest things I learned during it. First, apparently, one can do parabiotic experiments, which involves connecting two organisms subcutaneously. After a while (days or weeks), the vessels of the two organisms find each other and they start merging their circulatory systems. In Rando's experiments they attached mice from different age groups and studied the effects of young blood on old mice and vice versa. What they found (this is the second cool thing) is that stem cells that are responsible for muscle regeneration after injury are present and are totally fine in the old mice. It's just that the younger mice have some kind of a factor in their blood serum that stimulates the stem cell activation, while the old mice appear to be saddled with inhibitors. Joining a young mouse with an old one restored muscle regeneration in the old mouse completely. They also did some in vitro experiments to further understand what's going on. Pretty neat.
The last thing that kind of blew my mind, was this idea first proposed by Cairns in 1979, that through successive rounds of DNA replication the organism remembers which strands are the original ones and which ones are copies. These template strands are segregated together and find themselves in the same cells. This allows the organism to preserve the original code and withstand the mutational load in tissues with a lot of regeneration (most errors come about as a result of synthesis). It seems like initially no one could find any support for this hypothesis, but now Rando and others have presented some pretty convincing evidence based on DNA labeling experiments. I guess you needed to know where too look -- stem cells are the ones that carry the template DNA and there aren't that many of them relatively speaking.
While I don't know if there is anything informatics-related in what Tom Rando does but he is at Stanford, and is doing some of the coolest work around.
Friday, February 17, 2006
Viruses
"""If you put every virus particle on Earth together in a row, they would form a line 10 million light-years long. """
That seems like rather a lot of viruses.
discover magazine article
That seems like rather a lot of viruses.
discover magazine article
Bio-ontologies
A blog post about the Nature Biotech article that criticized bio-ontologies, featuring our very own Mark Musen.
Via postgenomic.
nodalpoint
Via postgenomic.
nodalpoint
Thursday, February 16, 2006
New metablog thing
I don't know what it is exactly, but I like it.
postgenomic
Postgenomic aggregates posts from life science blogs and then does useful and interesting things with that data.
postgenomic
Saturday, February 11, 2006
Uniquifying a list in Python
I keep needing to do this all the time (given a list, remove all the duplicates). So a one-liner:
dict().fromkeys(yourlist).keys()
Thursday, February 02, 2006
Advice for getting published
10 simple rules from Phil Bourne over at PLOS Comp Bio. They are mostly common sense, but make for some useful reading anyway.
Wednesday, February 01, 2006
Tuesday, January 31, 2006
Monday, January 30, 2006
Viruses/Germs vs genes
A virus makes you fat?
A virus makes you autistic?
A germ makes you gay?
A germ makes you schizophrenic? (Remind me not to go near cats).
Maybe, maybe not, but viruses/germs are ubiquitous and genes are not the be all and end all.
Also I would like to take this opportunity to coin the word "germes" for foreign DNA/parasites that have effects that seem indistinguishable from genetic. For good measure, I also coin "germome", and "germetics", and "germealogy" (yes, that's how you spell it).
Start using these words immediately.
A virus makes you autistic?
A germ makes you gay?
A germ makes you schizophrenic? (Remind me not to go near cats).
Maybe, maybe not, but viruses/germs are ubiquitous and genes are not the be all and end all.
Also I would like to take this opportunity to coin the word "germes" for foreign DNA/parasites that have effects that seem indistinguishable from genetic. For good measure, I also coin "germome", and "germetics", and "germealogy" (yes, that's how you spell it).
Start using these words immediately.
Wednesday, January 18, 2006
Monday, January 16, 2006
Clickworkers
NASA uses humans to find craters on Mars. I wonder how this would work for tumors, or other cell classification tasks....
Clickworkers
Clickworkers
Saturday, December 31, 2005
13 things that do not make sense
An article in New Scientist. The first thing that makes no sense is the placebo effect, which may or may not be real.
Wednesday, December 28, 2005
Tuesday, December 27, 2005
Tuesday, December 13, 2005
giftornot.com
Many real alive people are writing articles about this gift ideas website. I endorse this product.
Sunday, December 11, 2005
Forecasting
An interesting New Yorker article on forecasting and ostensible expertise.
There are also many studies showing that expertise and experience do not make someone a better reader of the evidence. In one, data from a test used to diagnose brain damage were given to a group of clinical psychologists and their secretaries. The psychologists’ diagnoses were no better than the secretaries’.
link
There are also many studies showing that expertise and experience do not make someone a better reader of the evidence. In one, data from a test used to diagnose brain damage were given to a group of clinical psychologists and their secretaries. The psychologists’ diagnoses were no better than the secretaries’.
link
Tuesday, December 06, 2005
Spicy food and cancer
This article has some interesting statistics about cancer and spicy/Indian food. At least something that's good for cancer tastes good... Good blog too.
Sunday, December 04, 2005
NY Times articles
I recently found out that you can read an NYT article all on one page, by adding ?pagewanted=all to the end of the url. This bookmarklet also does the trick.
javascript:if%20(location.href.indexOf('?')>0){location.href+="&pagewanted=all"}else{location.href+="?pagewanted=all"};
javascript:if%20(location.href.indexOf('?')>0){location.href+="&pagewanted=all"}else{location.href+="?pagewanted=all"};
Tuesday, November 29, 2005
Using Curl for fast downloads
This is nice, although I am rarely waiting on downloads these days.
From here.
For example, suppose you want to download the Mandrake 8.0 ISO from the following three locations:
url1=http://ftp.eecs.umich.edu/pub/linux/mandrake/iso/Mandrake80-inst.iso
url2=http://ftp.rpmfind.net/linux/Mandrake/iso/Mandrake80-inst.iso
url3=http://ftp.wayne.edu/linux/mandrake/iso/Mandrake80-inst.iso
The length of the file is 677281792, so initiate three simultaneous downloads using curl's "--range" option:
bash$ curl -r 0-199999999 -o mdk-iso.part1 $url1 &
bash$ curl -r 200000000-399999999 -o mdk-iso.part2 $url2 &
bash$ curl -r 400000000- -o mdk-iso.part3 $url3 &
From here.
For example, suppose you want to download the Mandrake 8.0 ISO from the following three locations:
url1=http://ftp.eecs.umich.edu/pub/linux/mandrake/iso/Mandrake80-inst.iso
url2=http://ftp.rpmfind.net/linux/Mandrake/iso/Mandrake80-inst.iso
url3=http://ftp.wayne.edu/linux/mandrake/iso/Mandrake80-inst.iso
The length of the file is 677281792, so initiate three simultaneous downloads using curl's "--range" option:
bash$ curl -r 0-199999999 -o mdk-iso.part1 $url1 &
bash$ curl -r 200000000-399999999 -o mdk-iso.part2 $url2 &
bash$ curl -r 400000000- -o mdk-iso.part3 $url3 &
Monday, November 14, 2005
Machine learning blog
At this machine learning blog, they scanned in a number of old papers, including:
"Why isn't everyone a Bayesian?" by Efron B, American Statistician 1986. Examines reasons why not everybody was a Bayesian, as of 1986, with scorching reply from Lindley.
"Axioms of Maximum Entropy" by Skilling, MaxEnt 1988 proceedings. Sets up four practically motivated axioms, and uses them to derive maximum entropy as the unique method for picking a single probability distribution from the set of valid probability distributions.
The other posts are worth a look too.
Here
"Why isn't everyone a Bayesian?" by Efron B, American Statistician 1986. Examines reasons why not everybody was a Bayesian, as of 1986, with scorching reply from Lindley.
"Axioms of Maximum Entropy" by Skilling, MaxEnt 1988 proceedings. Sets up four practically motivated axioms, and uses them to derive maximum entropy as the unique method for picking a single probability distribution from the set of valid probability distributions.
The other posts are worth a look too.
Here
Friday, November 04, 2005
The FDR
False Discovery Rate is really important for most of us. This paper (lecture notes, actually) covers most of what you need to know. Gil Chu recently gave a talk on local FDR, so I am linking to that paper too. The first link's server seems to be down right now (hopefully temporarily). Anyone know what the difference between local FDR and PER is?
Multiple Hypothesis Correction
Review from Genome Res.
Local FDR
Multiple Hypothesis Correction
Review from Genome Res.
Local FDR
Monday, October 24, 2005
Nature podcast
Nature now has a podcast, and it's pretty good.
http://www.nature.com/nature/podcast/index.html
http://www.nature.com/nature/podcast/index.html
Tuesday, October 18, 2005
Typefaces
Choosing typefaces can be stressful, and if you are not careful you could end up using Comic Sans, or Arial (this is Zach-style font snobbery, the most enjoyable kind). That's why I have made a list of fonts I can refer to. For those unaware, serif is better for long strings of text, particularly printed text; it's less tiring to follow.
Best serif fonts (from bamag.com): Garamond (a common font), Caslon (apparently best for books), Stone, Jaslon. I don't really like serif but sometimes there is no choice.
Best for the web: Verdana, it's common enough (unlike, say, Century Schoolbook, which I also like), and it looks nicer than Arial/Helvetica.
Best sans-serif: Apple knows a lot about nice fonts, so I figure I will copy them. They use Myriad for packaging/ads etc, and it's exceptionally nice. They also use Helvetica Neue sometimes. Finally, Lucida Grande is the "OS X font".
Best serif fonts (from bamag.com): Garamond (a common font), Caslon (apparently best for books), Stone, Jaslon. I don't really like serif but sometimes there is no choice.
Best for the web: Verdana, it's common enough (unlike, say, Century Schoolbook, which I also like), and it looks nicer than Arial/Helvetica.
Best sans-serif: Apple knows a lot about nice fonts, so I figure I will copy them. They use Myriad for packaging/ads etc, and it's exceptionally nice. They also use Helvetica Neue sometimes. Finally, Lucida Grande is the "OS X font".
Thursday, October 13, 2005
Bioinformatics blog
I stumbled upon this bioinformatics/genomics blog.
http://www.ghastlyfop.com/blog/
The 2005 zeitgeist is particularly interesting, it has a graph of hot topics (including machine learning methods) over the past year...
http://www.ghastlyfop.com/blog/2005/10/bioinformatics-zeitgeist-05.html
http://www.ghastlyfop.com/blog/
The 2005 zeitgeist is particularly interesting, it has a graph of hot topics (including machine learning methods) over the past year...
http://www.ghastlyfop.com/blog/2005/10/bioinformatics-zeitgeist-05.html
Monday, October 10, 2005
Saturday, October 01, 2005
PhD laws
If you are doing a PhD, you should be aware of Hofstadter's law, and Parkinson's law. Parkinson's law comes from a very interesting book by Parkinson about the civil service.
Monday, September 26, 2005
Selenium
If it's good enough for Brad Efron (he of bootstrap fame), maybe I should be taking Selenium too...
"""
Dr. Brad Efron, a professor of statistics at Stanford, has a different dietary approach. He does not have prostate cancer, but he had a couple of scares and he has friends who have it. So he is taking selenium, a trace mineral found in plants.
A study that randomly assigned people to take selenium or not to see whether it protected against skin cancer found that it had no effect on that cancer, but that the men taking it had only a third as many prostate cancers.
"""
from this NYT article on cancer and diet: Here
"""
Dr. Brad Efron, a professor of statistics at Stanford, has a different dietary approach. He does not have prostate cancer, but he had a couple of scares and he has friends who have it. So he is taking selenium, a trace mineral found in plants.
A study that randomly assigned people to take selenium or not to see whether it protected against skin cancer found that it had no effect on that cancer, but that the men taking it had only a third as many prostate cancers.
"""
from this NYT article on cancer and diet: Here
Friday, September 23, 2005
Top ten most cited works from 1976-1983 (via Metafilter)
Nice. I have read all of these and find their ideas intriguing if simple-minded.
1. T.S. Kuhn, The Structure of Scientific Revolutions. 1962
2. J. Joyce, Ulysses. 1922
3. N. Frye, Anatomy of Criticism: Four Essays. 1957
4. L. Wittgenstein, Philosophical Investigations
5. N. Chomsky, Aspects of the Theory of Syntax. 1965
6. M. Foucault, The Order of Things. 1966
7. J. Derrida, Of Grammatology
8. R. Barthes, S/Z. 1970
9. M. Heidegger, Being and Time. 1927
10. E.R. Curtius, European Literature and the Latin Middle Ages. 1948
1. T.S. Kuhn, The Structure of Scientific Revolutions. 1962
2. J. Joyce, Ulysses. 1922
3. N. Frye, Anatomy of Criticism: Four Essays. 1957
4. L. Wittgenstein, Philosophical Investigations
5. N. Chomsky, Aspects of the Theory of Syntax. 1965
6. M. Foucault, The Order of Things. 1966
7. J. Derrida, Of Grammatology
8. R. Barthes, S/Z. 1970
9. M. Heidegger, Being and Time. 1927
10. E.R. Curtius, European Literature and the Latin Middle Ages. 1948
Tuesday, September 20, 2005
Another Stanford MacArthur genius
Pehr Harbury of the Biochemistry Department. Last year, Daphne Koller and Julie Theriot were among the winners.
link
link
Tuesday, September 13, 2005
Humanities sucks.
Nice piece in the guardian about science communication. He'll really
make friends at the Guardian with lines like:
"humanities graduates in the media, who suspect themselves to be
intellectuals, desperately need to reinforce the idea that science is
nonsense: because they've denied themselves access to the most
significant developments in the history of western thought for 200
years, and secretly, deep down, they're angry with themselves over
that."
Here
make friends at the Guardian with lines like:
"humanities graduates in the media, who suspect themselves to be
intellectuals, desperately need to reinforce the idea that science is
nonsense: because they've denied themselves access to the most
significant developments in the history of western thought for 200
years, and secretly, deep down, they're angry with themselves over
that."
Here
Monday, September 12, 2005
Practice Blackjack
As I was exploring the potential of Javascript/DHTML in informatics-related applications, I stumbled onto another area where that technology can be put to good use -- gambling! So to practice my Javascript skills and to help beginner to intermediate Blackjack players everywhere, I wrote a small application and put it up at blackjack-bst.com (which stands for Blackjack Basic Strategy Trainer dot com). Check it out and tell me what you think. It is still a work in progress.
Monday, August 29, 2005
Tuesday, August 16, 2005
Wednesday, June 29, 2005
Synthetic biology news from boing boing
http://www.boingboing.net/2005/06/29/craig_venters_new_co.html
Thursday, June 09, 2005
tinyproxy
Tinyproxy is a cool little program that allows you to use your linux/mac box as a proxy with little overhead. This is useful because when you are away, say in Europe, you can access all the Stanford-accessible journals by going through your Stanford box. Like a lot of these little tools, this was introduced to me by Mike.
Tinyproxy
Tinyproxy
Freecell
Javascript / DHTML are important technologies in bioinformatics. I swear that is why I did this.
Just Freecell
Just Freecell