BMI Students

Thursday, February 24, 2005

Dealing with huge sequence files

Say you have the human genome sitting on your computer and you need to access a particular region, like chromosome 1, base pairs 8,100,201 through 8,100,438. What I usually do is make a file for each chromosome, stripping it of all non-sequence characters (spaces, carriage returns,etc.) . Then to grab the desired bit I do the following in python:
seq = - 8100201 +1)
"chrom1_f" is a filehandle to the chromosome file -- I would typically have a bunch of them open if I am working with a whole genome. And of course, random access can be used in any language -- not just python.


Post a Comment

<< Home