book

Big Data Visualization

Name: Big Data Visualization
ISBN: 9781785281945

by Dalong Chen, James D. Miller

February 2017

Beginner to intermediate

304 pages

6h 3m

English

Packt Publishing

Read now

Unlock full access

Big Data Visualization
Big Data Visualization
Credits
About the Author
About the Reviewer
www.PacktPub.com
Why subscribe?
Customer Feedback
Preface
What this book covers
What you need for this book
Who this book is for

Conventions
Reader feedback
Customer support
Downloading the example codeDownloading the color images of this bookErrataPiracyQuestions
1. Introduction to Big Data Visualization
An explanation of data visualizationConventional data visualization conceptsTraining options
Challenges of big data visualization
Big dataUsing Excel to gauge your dataPushing big data higherThe 3VsVolumeVelocityVarietyCategorizationSuch are the 3VsData qualityDealing with outliersMeaningful displaysAdding a fourth VVisualization philosophiesMore on varietyVelocityVolumeAll is not lost
Approaches to big data visualization
Access, speed, and storageEntering HadoopContextQualityDisplaying resultsNot a new conceptInstant gratificationsData-driven documentsDashboardsOutliersInvestigation and adjudicationOperational intelligence
Summary
2. Access, Speed, and Storage with Hadoop
About HadoopWhat else but Hadoop?IBM too!
Log files and Excel
An R scripting examplePoints to consider
Hadoop and big data
Entering HadoopAWS for Hadoop projects
Example 1
Defining the environmentGetting startedUploading the dataManipulating the dataA specific exampleConclusion
Example 2
SortingParsing the IP
Summary
3. Understanding Your Data Using R
Definitions and explanationsComparisonsContrastsTendenciesDispersion
Adding context
About R
R and big data
Example 1
Digging in with R
Example 2
Definitions and explanationsNo loopingComparisonsContrastsTendenciesDispersion
Summary
4. Addressing Big Data Quality
Data quality categorized
DataManager
DataManager and big data
Some examples
Some reformattingA little setupSelecting nodesConnecting the nodesThe work nodeAdding the script codeExecuting the sceneOther data quality exercisesWhat else is missing?Status and relevanceNaming your nodes
More examples
ConsistencyReliabilityAppropriatenessAccessibilityOther Output nodes
Summary
5. Displaying Results Using D3
About D3
D3 and big data
Some basic examples
Getting started with D3A little down timeVisual transitionsMultiple donuts
More examples
Another twist on bar chart visualizationsOne more exampleAdopting the sample
Summary
6. Dashboards for Big Data - Tableau
About Tableau
Tableau and big data
Example 1 - Sales transactions
Adding more contextWrangling the dataMoving onA Tableau dashboardSaving the workbookPresenting our workMore tools
Example 2
What's the goal? - purpose and audienceSales and spendSales v Spend and Spend as % of Sales TrendTables and indicatorsAll together now
Summary
7. Dealing with Outliers Using Python
About Python
Python and big data
Outliers
Options for outliersDeleteTransformOutliers identified
Some basic examples
Testing slot machines for profitabilityInto the outliersHandling excessive valuesEstablishing the valueBig data noteSetting outliersRemoving Specific RecordsRedundancy and riskAnother pointIf TypeReusedChanging specific valuesSetting the AgeAnother noteDropping fields entirelyMore to drop
More examples
A themed populationA focused philosophy
Summary
8. Big Data Operational Intelligence with Splunk
About SplunkSplunk and big data
Splunk visualization - real-time log analysis
IBM CognosPointing SplunkSetting rows and columnsFinishing with errorsSplunk and processing errors
Splunk visualization - deeper into the logs
New fieldsEditing the dashboardMore about dashboards
Summary

Content preview from Big Data Visualization

Chapter 2. Access, Speed, and Storage with Hadoop

This chapter aims to target the challenge of storing and accessing large volumes and varieties (structured or unstructured) of data offering working examples demonstrating solutions for effectively addressing these issues.

Since it is expected that you are somewhat familiar with Hadoop, this chapter starts with a brief overview of the technology, but doesn't intend to cover all of the details as the goal is to provide a demonstration using Hadoop as a technology to address the challenge of storing and accessing big data.

In addition, in an effort towards completeness, we'll touch on the possible alternatives to using Hadoop, such as Apache Spark and even a simple scripting solution.

By the end of ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781785281945

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Big Data Visualization

by Dalong Chen, James D. Miller

Chapter 2. Access, Speed, and Storage with Hadoop

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.