Compressing Whitespace
Problem
You have runs of whitespace in a file (perhaps it is fixed length, space padded) and you need to compress the spaces down to a single character or delimiter.
Solution
Use tr or awk as appropriate.
Discussion
If you are trying to compress runs of whitespace down to a single
character, you can use tr, but be aware that you
may damage the file if it is not well formed. For example, if fields are
delimited by multiple whitespace characters but internally have spaces,
compressing multiple spaces down to one space will remove that
distinction. Imagine if the _
characters
in the following example were spaces instead. Note
the → denotes a literal tab character in the output.
$ cat data_file Header1 Header2 Header3 Rec1_Field1 Rec1_Field2 Rec1_Field3 Rec2_Field1 Rec2_Field2 Rec2_Field3 Rec3_Field1 Rec3_Field2 Rec3_Field3 $ cat data_file | tr -s ' ' '\t' Header1 → Header2 → Header3 Rec1_Field1 → Rec1_Field2 → Rec1_Field3 Rec2_Field1 → Rec2_Field2 → Rec2_Field3 Rec3_Field1 → Rec3_Field2 → Rec3_Field3
If your field delimiter is more than a single character,
tr won’t work since it translates single
characters from its first set into the matching
single character in the second set. You can use
awk to combine or convert field separators. awk’s internal
field separator FS
accepts regular
expressions, so you can separate on pretty much anything. There is a
handy trick to this as well. An assignment to any field causes
awk to reassemble the record using the output field separator ...
Get bash Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.