Text files and Unix
I often have large text files to deal with. There are three essential Unix utilities for doing this (without resorting to awk).
1. Less: is just like more, except better. You can go backwards, and use '/' to search (like vi). And it doesn't load the whole file into memory.
2. Sort: can sort a file according to one of its fields. For instance, "sort -k 10 -n file" sorts "file" by the 10th field, and does so numerically (as opposed to alphabetically). Sort -k 10,11,12 also works as you would expect.
3. Cut: allows you to look at the first n columns of a file. For instance, "cut -c0-100 file" shows the first 100 characters of each line in "file". If you have a big DNA sequence, all on one line, then you can cut out your area of interest easily.
"sort -k 10 file |cut -c0-100 |less" = sweeeet....
1. Less: is just like more, except better. You can go backwards, and use '/' to search (like vi). And it doesn't load the whole file into memory.
2. Sort: can sort a file according to one of its fields. For instance, "sort -k 10 -n file" sorts "file" by the 10th field, and does so numerically (as opposed to alphabetically). Sort -k 10,11,12 also works as you would expect.
3. Cut: allows you to look at the first n columns of a file. For instance, "cut -c0-100 file" shows the first 100 characters of each line in "file". If you have a big DNA sequence, all on one line, then you can cut out your area of interest easily.
"sort -k 10 file |cut -c0-100 |less" = sweeeet....
3 Comments:
Yes you can actually do quite a bit within unix with just a few commands. Use them some and it becomes very natural. Other commands I use all the time:
wc
uniq (with sort)
grep (very key)
These less frequently: paste, head, tail
Finally, these are probably good too, but I haven't gotten into the habit of using: tr, expand, unexpand
By serge, at 10:56 AM
My friend Devin also loves cut with his entire body, including his pee-pee. It seems to inspire strong feelings and urges.
By brian, at 1:25 PM
Also a great unix utility: sdiff. See the diff of two files side by side - this is really useful.
By brian, at 4:41 PM
Post a Comment
<< Home