O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Big Data Visualization

Book Description

Learn effective tools and techniques to separate big data into manageable and logical components for efficient data visualization

About This Book

  • This unique guide teaches you how to visualize your cluttered, huge amounts of big data with ease
  • It is rich with ample options and solid use cases for big data visualization, and is a must-have book for your shelf
  • Improve your decision-making by visualizing your big data the right way

Who This Book Is For

This book is for data analysts or those with a basic knowledge of big data analysis who want to learn big data visualization in order to make their analysis more useful. You need sufficient knowledge of big data platform tools such as Hadoop and also some experience with programming languages such as R. This book will be great for those who are familiar with conventional data visualizations and now want to widen their horizon by exploring big data visualizations.

What You Will Learn

  • Understand how basic analytics is affected by big data
  • Deep dive into effective and efficient ways of visualizing big data
  • Get to know various approaches (using various technologies) to address the challenges of visualizing big data
  • Comprehend the concepts and models used to visualize big data
  • Know how to visualize big data in real time and for different use cases
  • Understand how to integrate popular dashboard visualization tools such as Splunk and Tableau
  • Get to know the value and process of integrating visual big data with BI tools such as Tableau
  • Make sense of the visualization options for big data, based upon the best suited visualization techniques for big data

In Detail

When it comes to big data, regular data visualization tools with basic features become insufficient. This book covers the concepts and models used to visualize big data, with a focus on efficient visualizations.

This book works around big data visualizations and the challenges around visualizing big data and address characteristic challenges of visualizing like speed in accessing, understanding/adding context to, improving the quality of the data, displaying results, outliers, and so on. We focus on the most popular libraries to execute the tasks of big data visualization and explore "big data oriented" tools such as Hadoop and Tableau. We will show you how data changes with different variables and for different use cases with step-through topics such as: importing data to something like Hadoop, basic analytics.

The choice of visualizations depends on the most suited techniques for big data, and we will show you the various options for big data visualizations based upon industry-proven techniques. You will then learn how to integrate popular visualization tools with graphing databases to see how huge amounts of certain data. Finally, you will find out how to display the integration of visual big data with BI using Cognos BI.

Style and approach

With the help of insightful real-world use cases, we'll tackle data in the world of big data. The scalability and hugeness of the data makes big data visualizations different from normal data visualizations, and this book addresses all the difficulties encountered by professionals while visualizing their big data.

Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the code file.

Table of Contents

  1. Big Data Visualization
    1. Big Data Visualization
    2. Credits
    3. About the Author
    4. About the Reviewer
    5. www.PacktPub.com
      1. Why subscribe?
    6. Customer Feedback
    7. Preface
      1. What this book covers
      2. What you need for this book
      3. Who this book is for
      4. Conventions
      5. Reader feedback
      6. Customer support
        1. Downloading the example code
        2. Downloading the color images of this book
        3. Errata
        4. Piracy
        5. Questions
    8. 1. Introduction to Big Data Visualization
      1. An explanation of data visualization
        1. Conventional data visualization concepts
        2. Training options
      2. Challenges of big data visualization
        1. Big data
        2. Using Excel to gauge your data
        3. Pushing big data higher
        4. The 3Vs
          1. Volume
          2. Velocity
          3. Variety
        5. Categorization
          1. Such are the 3Vs
          2. Data quality
          3. Dealing with outliers
          4. Meaningful displays
          5. Adding a fourth V
        6. Visualization philosophies
          1. More on variety
          2. Velocity
          3. Volume
          4. All is not lost
      3. Approaches to big data visualization
        1. Access, speed, and storage
        2. Entering Hadoop
        3. Context
        4. Quality
          1. Displaying results
          2. Not a new concept
          3. Instant gratifications
          4. Data-driven documents
          5. Dashboards
          6. Outliers
          7. Investigation and adjudication
          8. Operational intelligence
      4. Summary
    9. 2. Access, Speed, and Storage with Hadoop
      1. About Hadoop
        1. What else but Hadoop?
        2. IBM too!
      2. Log files and Excel
        1. An R scripting example
        2. Points to consider
      3. Hadoop and big data
        1. Entering Hadoop
        2. AWS for Hadoop projects
      4. Example 1
        1. Defining the environment
        2. Getting started
        3. Uploading the data
        4. Manipulating the data
          1. A specific example
        5. Conclusion
      5. Example 2
        1. Sorting
        2. Parsing the IP
      6. Summary
    10. 3. Understanding Your Data Using R
      1. Definitions and explanations
        1. Comparisons
        2. Contrasts
        3. Tendencies
        4. Dispersion
      2. Adding context
      3. About R
        1. R and big data
      4. Example 1
      5. Digging in with R
      6. Example 2
        1. Definitions and explanations
        2. No looping
        3. Comparisons
        4. Contrasts
        5. Tendencies
        6. Dispersion
      7. Summary
    11. 4. Addressing Big Data Quality
      1. Data quality categorized
      2. DataManager
      3. DataManager and big data
      4. Some examples
        1. Some reformatting
          1. A little setup
          2. Selecting nodes
          3. Connecting the nodes
          4. The work node
          5. Adding the script code
          6. Executing the scene
          7. Other data quality exercises
          8. What else is missing?
          9. Status and relevance
          10. Naming your nodes
      5. More examples
        1. Consistency
        2. Reliability
        3. Appropriateness
        4. Accessibility
        5. Other Output nodes
      6. Summary
    12. 5. Displaying Results Using D3
      1. About D3
      2. D3 and big data
      3. Some basic examples
        1. Getting started with D3
        2. A little down time
        3. Visual transitions
        4. Multiple donuts
      4. More examples
        1. Another twist on bar chart visualizations
        2. One more example
        3. Adopting the sample
      5. Summary
    13. 6. Dashboards for Big Data - Tableau
      1. About Tableau
      2. Tableau and big data
      3. Example 1 - Sales transactions
        1. Adding more context
        2. Wrangling the data
        3. Moving on
        4. A Tableau dashboard
        5. Saving the workbook
        6. Presenting our work
        7. More tools
      4. Example 2
        1. What's the goal? - purpose and audience
        2. Sales and spend
        3. Sales v Spend and Spend as % of Sales Trend
        4. Tables and indicators
        5. All together now
      5. Summary
    14. 7. Dealing with Outliers Using Python
      1. About Python
      2. Python and big data
      3. Outliers
        1. Options for outliers
          1. Delete
          2. Transform
        2. Outliers identified
      4. Some basic examples
        1. Testing slot machines for profitability
          1. Into the outliers
          2. Handling excessive values
          3. Establishing the value
          4. Big data note
          5. Setting outliers
          6. Removing Specific Records
          7. Redundancy and risk
          8. Another point
            1. If Type
            2. Reused
          9. Changing specific values
            1. Setting the Age
            2. Another note
          10. Dropping fields entirely
          11. More to drop
      5. More examples
        1. A themed population
        2. A focused philosophy
      6. Summary
    15. 8. Big Data Operational Intelligence with Splunk
      1. About Splunk
        1. Splunk and big data
      2. Splunk visualization -  real-time log analysis
        1. IBM Cognos
        2. Pointing Splunk
        3. Setting rows and columns
        4. Finishing with errors
        5. Splunk and processing errors
      3. Splunk visualization - deeper into the logs
        1. New fields
        2. Editing the dashboard
        3. More about dashboards
      4. Summary