Chapter 16. Data Mining with Weka

Popular books such as Moneyball, Freakonomics, and Competing on Analytics have increased interest in using analytics to get a competitive edge. This chapter explains how some of the more popular analytical techniques work and how they can be applied in real life scenarios. Business analysts and BI professionals are accustomed to reporting on organizational performance. By now, most people are familiar with the use of BI tools and OLAP to report, to identify exceptions, and answer basic questions. The challenge for many people is that new questions require new ways of looking at data. Reporting and OLAP techniques are good when the types of questions are well established, and for explaining past or current activity. These techniques can't be used to understand complex relationships, explore large volumes of detailed data, or predict future activity. Data mining (including visualization and text analytics) provides the means to accomplish tasks that aren't possible with standard BI tools. These advanced analytics are often not used because of their assumed complexity and cost. The truth is that many techniques can be applied simply, and often with relatively inexpensive—sometimes free—tools. One of the more popular tools is Pentaho Data Mining (PDM), better known as Weka (rhymes with "Mecca"), which is the subject of the current chapter.

Although data mining is a familiar term used to denote the subject at hand, some people prefer to call it machine ...

Get Pentaho® Solutions: Business Intelligence and Data Warehousing with Pentaho and MySQL® now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.