BMI Students

Thursday, February 24, 2005

Affy talk

This talk was forwarded to me by Nikesh. It's a pretty interesting look at the evolution of Affymetrix, by Steve Fodor. It's low-bandwidth too so you can listen while you work...
Video

Dealing with huge sequence files

Say you have the human genome sitting on your computer and you need to access a particular region, like chromosome 1, base pairs 8,100,201 through 8,100,438. What I usually do is make a file for each chromosome, stripping it of all non-sequence characters (spaces, carriage returns,etc.) . Then to grab the desired bit I do the following in python:
chrom1_f.seek(8100200)
seq = chrom1_f.read(8100438 - 8100201 +1)
"chrom1_f" is a filehandle to the chromosome file -- I would typically have a bunch of them open if I am working with a whole genome. And of course, random access can be used in any language -- not just python.

Tuesday, February 22, 2005

Weka

I've never used Weka; in fact I hadn't heard of it until Maureen mentioned it in her colloquim. But Maureen recommended it and it looks pretty solid based on the description on the website. The screenshots look cool, too. Anyway, Weka is some kind of a machine learning toolkit with a pretty user interface.

Friday, February 18, 2005

Restaurants

This seems like a solid restaurant list of the PA - San Jose area. It highlights many of my favourites, including Pizza Chicago, Krung Thai and Stacks.....

Tuesday, February 15, 2005

Python stuff

In Python, it's really easy to keep a bunch of variables together. I've started using this a lot recently. Just make a dummy class like so...
class container:
pass
Then you can fill an instance of it with any data you want to keep together. For instance...
my_gene = container()
my_gene.length = 100
my_gene.name = "Something"
dir(my_gene) #to see what is in there
You can use the same dummy container for any collection of data, if you are not too worried about formalising its contents.

Huh huh huh. He said --

An interesting article in PLOS Biology on issues related to sequencing ancient DNA -- the Jurassic Park type of thing. The coolest thing is that it contains this quote: "What we've done is carbon-date a shitload of bison and get DNA out of them".

Monday, February 14, 2005

Paper on tiling arrays

This is a pretty good review of tiling arrays, and the so-called "dark matter" of the genome. It's basically a list of different tiling array experiments, from E. coli up, and what they found. This paper is from Trends in Genetics, which is a good journal to read to see what's going on in genetics/genomics, and that whole area.

Gain and loss of amino acids through evolution

A purely computational (bioinformatic?) paper in Nature. They make the claim that several amino acids are being gained and several are being lost throughout the different branches in the tree of life. The idea is that there was the original subset of amino acids that were in use early on (3.4bn years ago) and the others were added later -- those are the ones that are still being accrued.

Friday, February 04, 2005

CiteULike

CiteULike is a sweet reference collection website, similar to del.icio.us. You can keep a list of all the papers you intend to read there, and tag them, then export as BibTex for referencing in papers. There's also a social network aspect to it, since you can see who else has listed your favourite paper and see what else they are reading. I am trying it out. It's pretty gnarly, and has a bookmarklet, which makes it very easy to use.

New blog

This is the new BMI students blog. This is a test post.

More stuff here for testing extended entries.