BMI Students: Text files and Unix

Sunday, March 06, 2005

Text files and Unix

I often have large text files to deal with. There are three essential Unix utilities for doing this (without resorting to awk).

1. Less: is just like more, except better. You can go backwards, and use '/' to search (like vi). And it doesn't load the whole file into memory.
2. Sort: can sort a file according to one of its fields. For instance, "sort -k 10 -n file" sorts "file" by the 10th field, and does so numerically (as opposed to alphabetically). Sort -k 10,11,12 also works as you would expect.
3. Cut: allows you to look at the first n columns of a file. For instance, "cut -c0-100 file" shows the first 100 characters of each line in "file". If you have a big DNA sequence, all on one line, then you can cut out your area of interest easily.

"sort -k 10 file |cut -c0-100 |less" = sweeeet....

3 Comments:

Yes you can actually do quite a bit within unix with just a few commands. Use them some and it becomes very natural. Other commands I use all the time:
wc
uniq (with sort)
grep (very key)

These less frequently: paste, head, tail

Finally, these are probably good too, but I haven't gotten into the habit of using: tr, expand, unexpand

By serge, at 10:56 AM
My friend Devin also loves cut with his entire body, including his pee-pee. It seems to inspire strong feelings and urges.

By brian, at 1:25 PM
Also a great unix utility: sdiff. See the diff of two files side by side - this is really useful.

By brian, at 4:41 PM

BMI Students

Sunday, March 06, 2005

Text files and Unix

3 Comments:

About

Contributors

Previous