O'Reilly logo

Apache Mahout Clustering Designs by Ashish Gupta

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Working with CSV files

Generally, a problem that arises while using Mahout algorithms is how to use files that are in CSV, TSV, or in a similar format. So, here, again, the main challenge is to convert the files into vector format. Once done, the rest of the process is the same as defined previously. Let's look at the code that takes a CSV file and writes the vector format that is usable by Mahout:

public String getSeqFile(String inputLocation) throws Exception { String outputPath="<output path>"; //Location where you want to save the output FileSystem fs = null; SequenceFile.Writer writer; fs = FileSystem.get(getConfiguration()); Path vecoutput =new Path(outputPath); writer = new SequenceFile.Writer(fs, getConfiguration(), vecoutput, Text.class, ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required