Book Description
Explore intuitive data analysis techniques and powerful machine learning methods using over 130 practical recipes
In Detail
This book will take you on a voyage through all the steps involved in data analysis. It provides synergy between Haskell and data modeling, consisting of carefully chosen examples featuring some of the most popular machine learning techniques.
You will begin with how to obtain and clean data from various sources. You will then learn how to use various data structures such as trees and graphs. The meat of data analysis occurs in the topics involving statistical techniques, parallelism, concurrency, and machine learning algorithms, along with various examples of visualizing and exporting results. By the end of the book, you will be empowered with techniques to maximize your potential when using Haskell for data analysis.
What You Will Learn
 Obtain and analyze raw data from various sources including text files, CSV files, databases, and websites
 Implement practical tree and graph algorithms on various datasets
 Apply statistical methods such as moving average and linear regression to understand patterns
 Fiddle with parallel and concurrent code to speed up and simplify timeconsuming algorithms
 Find clusters in data using some of the most popular machine learning algorithms
 Manage results by visualizing or exporting data
Publisher Resources
Table of Contents

Haskell Data Analysis Cookbook
 Table of Contents
 Haskell Data Analysis Cookbook
 Credits
 About the Author
 About the Reviewers
 www.PacktPub.com
 Preface

1. The Hunt for Data
 Introduction
 Harnessing data from various sources
 Accumulating text data from a file path
 Catching I/O code faults
 Keeping and representing data from a CSV file
 Examining a JSON file with the aeson package
 Reading an XML file using the HXT package
 Capturing table rows from an HTML page
 Understanding how to perform HTTP GET requests
 Learning how to perform HTTP POST requests
 Traversing online directories for data
 Using MongoDB queries in Haskell
 Reading from a remote MongoDB server
 Exploring data from a SQLite database

2. Integrity and Inspection
 Introduction
 Trimming excess whitespace
 Ignoring punctuation and specific characters
 Coping with unexpected or missing input
 Validating records by matching regular expressions
 Lexing and parsing an email address
 Deduplication of nonconflicting data items
 Deduplication of conflicting data items
 Implementing a frequency table using Data.List
 Implementing a frequency table using Data.MultiSet
 Computing the Manhattan distance
 Computing the Euclidean distance
 Comparing scaled data using the Pearson correlation coefficient
 Comparing sparse data using cosine similarity

3. The Science of Words
 Introduction
 Displaying a number in another base
 Reading a number from another base
 Searching for a substring using Data.ByteString
 Searching a string using the BoyerMooreHorspool algorithm
 Searching a string using the RabinKarp algorithm
 Splitting a string on lines, words, or arbitrary tokens
 Finding the longest common subsequence
 Computing a phonetic code
 Computing the edit distance
 Computing the JaroWinkler distance between two strings
 Finding strings within oneedit distance
 Fixing spelling mistakes

4. Data Hashing
 Introduction
 Hashing a primitive data type
 Hashing a custom data type
 Running popular cryptographic hash functions
 Running a cryptographic checksum on a file
 Performing fast comparisons between data types
 Using a highperformance hash table
 Using Google's CityHash hash functions for strings
 Computing a Geohash for location coordinates
 Using a bloom filter to remove unique items
 Running MurmurHash, a simple but speedy hashing algorithm
 Measuring image similarity with perceptual hashes

5. The Dance with Trees
 Introduction
 Defining a binary tree data type
 Defining a rose tree (multiway tree) data type
 Traversing a tree depthfirst
 Traversing a tree breadthfirst
 Implementing a Foldable instance for a tree
 Calculating the height of a tree
 Implementing a binary search tree data structure
 Verifying the order property of a binary search tree
 Using a selfbalancing tree
 Implementing a minheap data structure
 Encoding a string using a Huffman tree
 Decoding a Huffman code

6. Graph Fundamentals
 Introduction
 Representing a graph from a list of edges
 Representing a graph from an adjacency list
 Conducting a topological sort on a graph
 Traversing a graph depthfirst
 Traversing a graph breadthfirst
 Visualizing a graph using Graphviz
 Using Directed Acyclic Word Graphs
 Working with hexagonal and square grid networks
 Finding maximal cliques in a graph
 Determining whether any two graphs are isomorphic

7. Statistics and Analysis
 Introduction
 Calculating a moving average
 Calculating a moving median
 Approximating a linear regression
 Approximating a quadratic regression
 Obtaining the covariance matrix from samples
 Finding all unique pairings in a list
 Using the Pearson correlation coefficient
 Evaluating a Bayesian network
 Creating a data structure for playing cards
 Using a Markov chain to generate text
 Creating ngrams from a list
 Creating a neural network perceptron

8. Clustering and Classification
 Introduction
 Implementing the kmeans clustering algorithm
 Implementing hierarchical clustering
 Using a hierarchical clustering library
 Finding the number of clusters
 Clustering words by their lexemes
 Classifying the parts of speech of words
 Identifying key words in a corpus of text
 Training a partsofspeech tagger
 Implementing a decision tree classifier
 Implementing a kNearest Neighbors classifier
 Visualizing points using Graphics.EasyPlot

9. Parallel and Concurrent Design
 Introduction
 Using the Haskell Runtime System options
 Evaluating a procedure in parallel
 Controlling parallel algorithms in sequence
 Forking I/O actions for concurrency
 Communicating with a forked I/O action
 Killing forked threads
 Parallelizing pure functions using the Par monad
 Mapping over a list in parallel
 Accessing tuple elements in parallel
 Implementing MapReduce to count word frequencies
 Manipulating images in parallel using Repa
 Benchmarking runtime performance in Haskell
 Using the criterion package to measure performance
 Benchmarking runtime performance in the terminal

10. Realtime Data
 Introduction
 Streaming Twitter for realtime sentiment analysis
 Reading IRC chat room messages
 Responding to IRC messages
 Polling a web server for latest updates
 Detecting realtime file directory changes
 Communicating in real time through sockets
 Detecting faces and eyes through a camera stream
 Streaming camera frames for template matching

11. Visualizing Data
 Introduction
 Plotting a line chart using Google's Chart API
 Plotting a pie chart using Google's Chart API
 Plotting bar graphs using Google's Chart API
 Displaying a line graph using gnuplot
 Displaying a scatter plot of twodimensional points
 Interacting with points in a threedimensional space
 Visualizing a graph network
 Customizing the looks of a graph network diagram
 Rendering a bar graph in JavaScript using D3.js
 Rendering a scatter plot in JavaScript using D3.js
 Diagramming a path from a list of vectors
 12. Exporting and Presenting
 Index
Product Information
 Title: Haskell Data Analysis Cookbook
 Author(s):
 Release date: June 2014
 Publisher(s): Packt Publishing
 ISBN: 9781783286331