Parsing Comma-Separated Data


You have a string or a file of lines containing comma-separated values (CSV) that you need to read in. Many MS-Windows-based spreadsheets and some databases use CSV to export data.


Use my CSV class or a regular expression (see Chapter 4).


CSV is deceptive. It looks simple at first glance, but the values may be quoted or unquoted. If quoted, they may further contain escaped quotes. This far exceeds the capabilities of the StringTokenizer class (Section 3.3). Either considerable Java coding or the use of regular expressions is required. I’ll show both ways.

First, a Java program. Assume for now that we have a class called CSV that has a no-argument constructor, and a method called parse( ) that takes a string representing one line of the input file. The parse( ) method returns a list of fields. For flexibility, this list is returned as an Iterator (see Section 7.5). I simply use the Iterator’s hasNext( ) method to control the loop, and its next( ) method to get the next object.

import java.util.*;

/* Simple demo of CSV parser class.
public class CSVSimple {    
    public static void main(String[] args) {
        CSV parser = new CSV(  );
        Iterator it = parser.parse(
        while (it.hasNext(  )) {
            System.out.println(  ));

After the quotes are escaped, the string being parsed is actually the following:


Running CSVSimple yields the following output:

> java ...

Get Java Cookbook now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.