Compressing Whitespace

Problem

You have runs of whitespace in a file (perhaps it is fixed length, space padded) and you need to compress the spaces down to a single character or delimiter.

Solution

Use tr or awk as appropriate.

Discussion

If you are trying to compress runs of whitespace down to a single character, you can use tr, but be aware that you may damage the file if it is not well formed. For example, if fields are delimited by multiple whitespace characters but internally have spaces, compressing multiple spaces down to one space will remove that distinction. Imagine if the _ characters in the following example were spaces instead. Note the → denotes a literal tab character in the output.

$ cat data_file
Header1             Header2              Header3
Rec1_Field1         Rec1_Field2          Rec1_Field3
Rec2_Field1         Rec2_Field2          Rec2_Field3
Rec3_Field1         Rec3_Field2          Rec3_Field3

$ cat data_file | tr -s ' ' '\t'
Header1 → Header2 → Header3
Rec1_Field1 → Rec1_Field2 → Rec1_Field3
Rec2_Field1 → Rec2_Field2 → Rec2_Field3
Rec3_Field1 → Rec3_Field2 → Rec3_Field3

If your field delimiter is more than a single character, tr won’t work since it translates single characters from its first set into the matching single character in the second set. You can use awk to combine or convert field separators. awk’s internal field separator FS accepts regular expressions, so you can separate on pretty much anything. There is a handy trick to this as well. An assignment to any field causes awk to reassemble the record using the output field separator ...

Get bash Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.