Extracting and Rearranging Datafile Columns
Problem
You want to pull out columns from a datafile or rearrange them into a different order.
Solution
Use a utility that can produce columns from a file on demand.
Discussion
cvt_file.pl serves as a tool that converts entire files from one format to another. Another common datafile operation is to manipulate its columns. This is necessary, for example, when importing a file into a program that doesn’t understand how to extract or rearrange input columns for itself. To work around this problem, you can rearrange the datafile instead.
Recall that this chapter began with a description of a scenario involving a 12-column CSV file somedata.csv from which only columns 2, 11, 5, and 9 were needed. You can convert the file to tab-delimited format like this:
%cvt_file.pl --iformat=csv somedata.csv > somedata.txtBut then what? If you just want to knock out a short script to extract those specific four columns, that’s fairly easy: write a loop that reads input lines and writes only the columns you want in the proper order. But that would be a special-purpose script, useful only within a highly limited context. With just a little more effort, it’s possible to write a more general utility yank_col.pl that enables you to extract any set of columns. With such a tool, you’d specify the column list on the command line like this:
%yank_col.pl --columns=2,11,5,9 somedata.txt > tmp.txtBecause the script doesn’t use a hardcoded column list, it can be used to pull ...