9.12. Extract CSV Fields from a Specific Column
You want to extract every field (record item) from the third column of a CSV file.
The regular expressions from Recipe 9.11 can be reused here to iterate over each field in a CSV subject string. With a bit of extra code, you can count the number of fields from left to right in each row, or record, and extract the fields at the position you’re interested in.
The following regular expression (shown with and without the free-spacing option) matches a single CSV field and its preceding delimiter in two separate capturing groups. Since line breaks can appear within double-quoted fields, it would not be accurate to simply search from the beginning of each line in your CSV string. By matching and stepping past fields one by one, you can easily determine which line breaks appear outside of double-quoted fields and therefore start a new record.
The regular expressions in this recipe are designed to work correctly with valid CSV files only, according to the format rules discussed in Comma-Separated Values (CSV).
|Regex options: None|
( , | \r?\n | ^ ) # Capture the leading field delimiter to backref 1 ( # Capture a single field to backref 2: [^",\r\n]+ # Unquoted field | # Or: " (?:[^"]|"")* " # Quoted field (may contain escaped double quotes) )? # The group is optional because fields may be empty
|Regex options: Free-spacing|
|Regex flavors: ...|