BMI Students

Sunday, March 06, 2005

Text files and Unix

I often have large text files to deal with. There are three essential Unix utilities for doing this (without resorting to awk).

1. Less: is just like more, except better. You can go backwards, and use '/' to search (like vi). And it doesn't load the whole file into memory.
2. Sort: can sort a file according to one of its fields. For instance, "sort -k 10 -n file" sorts "file" by the 10th field, and does so numerically (as opposed to alphabetically). Sort -k 10,11,12 also works as you would expect.
3. Cut: allows you to look at the first n columns of a file. For instance, "cut -c0-100 file" shows the first 100 characters of each line in "file". If you have a big DNA sequence, all on one line, then you can cut out your area of interest easily.

"sort -k 10 file |cut -c0-100 |less" = sweeeet....

3 Comments:

  • Yes you can actually do quite a bit within unix with just a few commands. Use them some and it becomes very natural. Other commands I use all the time:
    wc
    uniq (with sort)
    grep (very key)

    These less frequently: paste, head, tail

    Finally, these are probably good too, but I haven't gotten into the habit of using: tr, expand, unexpand

    By Blogger serge, at 10:56 AM  

  • My friend Devin also loves cut with his entire body, including his pee-pee. It seems to inspire strong feelings and urges.

    By Blogger brian, at 1:25 PM  

  • Also a great unix utility: sdiff. See the diff of two files side by side - this is really useful.

    By Blogger brian, at 4:41 PM  

Post a Comment

<< Home