Book Description
Make more of your data using Clojure and this brilliant cookbook full of realworld recipes. From creating revealing graphs to using data analysis libraries, you’ll learn both the basics and advanced techniques.
 Get a handle on the torrent of data the modern Internet has created
 Recipes for every stage from collection to analysis
 A practical approach to analyzing data to help you make informed decisions
In Detail
Data is everywhere and it's increasingly important to be able to gain insights that we can act on. Using Clojure for data analysis and collection, this book will show you how to gain fresh insights and perspectives from your data with an essential collection of practical, structured recipes.
"The Clojure Data Analysis Cookbook" presents recipes for every stage of the data analysis process. Whether scraping data off a web page, performing data mining, or creating graphs for the web, this book has something for the task at hand.
You'll learn how to acquire data, clean it up, and transform it into useful graphs which can then be analyzed and published to the Internet. Coverage includes advanced topics like processing data concurrently, applying powerful statistical techniques like Bayesian modelling, and even data mining algorithms such as Kmeans clustering, neural networks, and association rules.
Table of Contents

Clojure Data Analysis Cookbook
 Table of Contents
 Clojure Data Analysis Cookbook
 Credits
 About the Author
 About the Reviewers
 www.PacktPub.com
 Preface

1. Importing Data for Analysis
 Introduction
 Creating a new project
 Reading CSV data into Incanter datasets
 Reading JSON data into Incanter datasets
 Reading data from Excel with Incanter
 Reading data from JDBC databases
 Reading XML data into Incanter datasets
 Scraping data from tables in web pages
 Scraping textual data from web pages
 Reading RDF data
 Reading RDF data with SPARQL
 Aggregating data from different formats

2. Cleaning and Validating Data
 Introduction
 Cleaning data with regular expressions
 Maintaining consistency with synonym maps
 Identifying and removing duplicate data
 Normalizing numbers
 Rescaling values
 Normalizing dates and times
 Lazily processing very large data sets
 Sampling from very large data sets
 Fixing spelling errors
 Parsing custom data formats
 Validating data with Valip

3. Managing Complexity with Concurrent Programming
 Introduction
 Managing program complexity with STM
 Managing program complexity with agents
 Getting better performance with commute
 Combining agents and STM
 Maintaining consistency with ensure
 Introducing safe side effects into the STM
 Maintaining data consistency with validators
 Tracking processing with watchers
 Debugging concurrent programs with watchers
 Recovering from errors in agents
 Managing input with sized queues

4. Improving Performance with Parallel Programming
 Introduction
 Parallelizing processing with pmap
 Parallelizing processing with Incanter
 Partitioning Monte Carlo simulations for better pmap performance
 Finding the optimal partition size with simulated annealing
 Parallelizing with reducers
 Generating online summary statistics with reducers
 Harnessing your GPU with OpenCL and Calx
 Using type hints
 Benchmarking with Criterium

5. Distributed Data Processing with Cascalog
 Introduction
 Distributed processing with Cascalog and Hadoop
 Querying data with Cascalog
 Distributing data with Apache HDFS
 Parsing CSV files with Cascalog
 Complex queries with Cascalog
 Aggregating data with Cascalog
 Defining new Cascalog operators
 Composing Cascalog queries
 Handling errors in Cascalog workflows
 Transforming data with Cascalog
 Executing Cascalog queries in the Cloud with Pallet

6. Working with Incanter Datasets
 Introduction
 Loading Incanter's sample datasets
 Loading Clojure data structures into datasets
 Viewing datasets interactively with view
 Converting datasets to matrices
 Using infix formulas in Incanter
 Selecting columns with $
 Selecting rows with $
 Filtering datasets with $where
 Grouping data with $groupby
 Saving datasets to CSV and JSON
 Projecting from multiple datasets with $join

7. Preparing for and Performing Statistical Data Analysis with Incanter
 Introduction
 Generating summary statistics with $rollup
 Differencing variables to show changes
 Scaling variables to simplify variable relationships
 Working with time series data with Incanter Zoo
 Smoothing variables to decrease noise
 Validating sample statistics with bootstrapping
 Modeling linear relationships
 Modeling nonlinear relationships
 Modeling multimodal Bayesian distributions
 Finding data errors with Benford's law

8. Working with Mathematica and R
 Introduction
 Setting up Mathematica to talk to Clojuratica for Mac OS X and Linux
 Setting up Mathematica to talk to Clojuratica for Windows
 Calling Mathematica functions from Clojuratica
 Sending matrices to Mathematica from Clojuratica
 Evaluating Mathematica scripts from Clojuratica
 Creating functions from Mathematica
 Processing functions in parallel in Mathematica
 Setting up R to talk to Clojure
 Calling R functions from Clojure
 Passing vectors into R
 Evaluating R files from Clojure
 Plotting in R from Clojure

9. Clustering, Classifying, and Working with Weka
 Introduction
 Loading CSV and ARFF files into Weka
 Filtering and renaming columns in Weka datasets
 Discovering groups of data using Kmeans clustering
 Finding hierarchical clusters in Weka
 Clustering with SOMs in Incanter
 Classifying data with decision trees
 Classifying data with the Naive Bayesian classifier
 Classifying data with support vector machines
 Finding associations in data with the Apriori algorithm

10. Graphing in Incanter
 Introduction
 Creating scatter plots with Incanter
 Creating bar charts with Incanter
 Graphing nonnumeric data in bar charts
 Creating histograms with Incanter
 Creating function plots with Incanter
 Adding equations to Incanter charts
 Adding lines to scatter charts
 Customizing charts with JFreeChart
 Saving Incanter graphs to PNG
 Using PCA to graph multidimensional data
 Creating dynamic charts with Incanter
 11. Creating Charts for the Web
 Index
Product Information
 Title: Clojure Data Analysis Cookbook
 Author(s):
 Release date: March 2013
 Publisher(s): Packt Publishing
 ISBN: 9781782162643