O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Analytics for the Internet of Things (IoT)

Book Description

Break through the hype and learn how to extract actionable intelligence from the flood of IoT data

About This Book

  • Make better business decisions and acquire greater control of your IoT infrastructure
  • Learn techniques to solve unique problems associated with IoT and examine and analyze data from your IoT devices
  • Uncover the business potential generated by data from IoT devices and bring down business costs

Who This Book Is For

This book targets developers, IoT professionals, and those in the field of data science who are trying to solve business problems through IoT devices and would like to analyze IoT data. IoT enthusiasts, managers, and entrepreneurs who would like to make the most of IoT will find this equally useful. A prior knowledge of IoT would be helpful but is not necessary. Some prior programming experience would be useful

What You Will Learn

  • Overcome the challenges IoT data brings to analytics
  • Understand the variety of transmission protocols for IoT along with their strengths and weaknesses
  • Learn how data flows from the IoT device to the final data set
  • Develop techniques to wring value from IoT data
  • Apply geospatial analytics to IoT data
  • Use machine learning as a predictive method on IoT data
  • Implement best strategies to get the most from IoT analytics
  • Master the economics of IoT analytics in order to optimize business value

In Detail

We start with the perplexing task of extracting value from huge amounts of barely intelligible data. The data takes a convoluted route just to be on the servers for analysis, but insights can emerge through visualization and statistical modeling techniques. You will learn to extract value from IoT big data using multiple analytic techniques.

Next we review how IoT devices generate data and how the information travels over networks. You’ll get to know strategies to collect and store the data to optimize the potential for analytics, and strategies to handle data quality concerns.

Cloud resources are a great match for IoT analytics, so Amazon Web Services, Microsoft Azure, and PTC ThingWorx are reviewed in detail next. Geospatial analytics is then introduced as a way to leverage location information. Combining IoT data with environmental data is also discussed as a way to enhance predictive capability. We’ll also review the economics of IoT analytics and you’ll discover ways to optimize business value.

By the end of the book, you’ll know how to handle scale for both data storage and analytics, how Apache Spark can be leveraged to handle scalability, and how R and Python can be used for analytic modeling.

Style and approach

This book follows a step-by-step, practical approach to combine the power of analytics and IoT and help you get results quickly

Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the code file.

Table of Contents

  1. Preface
    1. What this book covers
    2. What you need for this book
    3. Who this book is for
    4. Conventions
    5. Readers feedback
    6. Customer support
      1. Downloading the example code
      2. Downloading the color images of this book
      3. Errata
      4. Piracy
      5. Questions
  2. Defining IoT Analytics and Challenges
    1. The situation
    2. Defining IoT analytics
      1. Defining analytics
      2. Defining the Internet of Things
      3. The concept of constrained
    3. IoT analytics challenges
      1. The data volume
      2. Problems with time
      3. Problems with space
      4. Data quality
      5. Analytics challenges
    4. Business value concerns
    5. Summary
  3. IoT Devices and Networking Protocols
    1. IoT devices
      1. The wild world of IoT devices
        1. Healthcare
        2. Manufacturing
        3. Transportation and logistics
        4. Retail
        5. Oil and gas
        6. Home automation or monitoring
        7. Wearables
      2. Sensor types
    2. Networking basics
    3. IoT networking connectivity protocols
      1. Connectivity protocols (when the available power is limited)
        1. Bluetooth Low Energy (also called Bluetooth Smart)
        2. 6LoWPAN
        3. ZigBee
          1. Advantages of ZigBee
          2. Disadvantages of ZigBee
          3. Common use cases
        4. NFC
          1. Common use cases
        5. Sigfox
      2. Connectivity protocols (when power is not a problem)
        1. Wi-Fi
          1. Common use cases
        2. Cellular (4G/LTE)
          1. Common use cases
    4. IoT networking data messaging protocols
      1. Message Queue Telemetry Transport (MQTT)
        1. Topics
        2. Advantages to MQTT
        3. Disadvantages to MQTT
        4. QoS levels
          1. QoS 0
          2. QoS 1
          3. QoS 2
        5. Last Will and Testament (LWT)
        6. Tips for analytics
        7. Common use cases
      2. Hyper-Text Transport Protocol (HTTP)
        1. Representational State Transfer (REST) principles
        2. HTTP and IoT
        3. Advantages to HTTP
        4. Disadvantages to HTTP
      3. Constrained Application Protocol (CoAP)
        1. Advantages to CoAP
        2. Disadvantages to CoAP
        3. Message reliability
        4. Common use cases
      4. Data Distribution Service (DDS)
        1. Common use cases
    5. Analyzing data to infer protocol and device characteristics
    6. Summary
  4. IoT Analytics for the Cloud
    1. Building elastic analytics
      1. What is cloud infrastructure?
    2. Elastic analytics concepts
      1. Design with the endgame in mind
    3. Designing for scale
      1. Decouple key components
        1. Encapsulate analytics
        2. Decoupling with message queues
      2. Distributed computing
        1. Avoid containing analytics to one server
        2. When to use distributed and when to use one server
      3. Assuming that change is constant
      4. Leverage managed services
      5. Use Application Programming Interfaces (API)
    4. Cloud security and analytics
      1. Public/private keys
      2. Public versus private subnets
      3. Access restrictions
      4. Securing customer data
    5. The AWS overview
      1. AWS key concepts
        1. Regions
        2. Availability Zones
        3. Subnet
        4. Security groups
      2. AWS key core services
        1. Virtual Private Cloud (VPC)
        2. Identity and Access Management (IAM)
        3. Elastic Compute (EC2)
        4. Simple Storage Service (S3)
      3. AWS key services for IoT analytics
        1. Amazon Simple Queue Service (SQS)
        2. Amazon Elastic Map Reduce (EMR)
        3. AWS machine learning
        4. Amazon Relational Database Service (RDS)
        5. Amazon Redshift
    6. Microsoft Azure overview
      1. Azure Data Lake Store
      2. Azure Analysis Services
      3. HDInsight
        1. The R server option
    7. The ThingWorx overview
      1. ThingWorx Core
      2. ThingWorx Connection Services
      3. ThingWorx Edge
      4. ThingWorx concepts
        1. Thing templates
        2. Things
        3. Properties
        4. Services
        5. Events
        6. Thing shapes
        7. Data shapes
        8. Entities
    8. Summary
  5. Creating an AWS Cloud Analytics Environment
    1. The AWS CloudFormation overview
    2. The AWS Virtual Private Cloud (VPC) setup walk-through
      1. Creating a key pair for the NAT and bastion instances
      2. Creating an S3 bucket to store data
      3. Creating a VPC for IoT Analytics
        1. What is a NAT gateway?
        2. What is a bastion host?
        3. Your VPC architecture
        4. The VPC Creation walk-through
    3. How to terminate and clean up the environment
    4. Summary
  6. Collecting All That Data - Strategies and Techniques
    1. Designing data processing for analytics
      1. Amazon Kinesis
      2. AWS Lambda
      3. AWS Athena
      4. The AWS IoT platform
      5. Microsoft Azure IoT Hub
    2. Applying big data technology to storage
      1. Hadoop
        1. Hadoop cluster architectures
          1. What is a Node?
          2. Node types
        2. Hadoop Distributed File System
        3. Parquet
        4. Avro
        5. Hive
        6. Serialization/Deserialization (SerDe)
        7. Hadoop MapReduce
        8. Yet Another Resource Negotiator (YARN)
      2. HBase
      3. Amazon DynamoDB
      4. Amazon S3
    3. Apache Spark for data processing
      1. What is Apache Spark?
      2. Spark and big data analytics
      3. Thinking about a single machine versus a cluster of machines
      4. Using Spark for IoT data processing
    4. To stream or not to stream
      1. Lambda architectures
    5. Handling change
    6. Summary
  7. Getting to Know Your Data - Exploring IoT Data
    1. Exploring and visualizing data
      1. The Tableau overview
      2. Techniques to understand data quality
        1. Look at your data - au naturel
        2. Data completeness
        3. Data validity
        4. Assessing Information Lag
        5. Representativeness
      3. Basic time series analysis
        1. What is meant by time series?
        2. Applying time series analysis
      4. Get to know categories in the data
      5. Bring in geography
    2. Look for attributes that might have predictive value
    3. R (the pirate's language...if he was a statistician)
      1. Installing R and RStudio
      2. Using R for statistical analysis
    4. Summing it all up
    5. Solving industry-specific analysis problems
      1. Manufacturing
      2. Healthcare
      3. Retail
    6. Summary
  8. Decorating Your Data - Adding External Datasets to Innovate
    1. Adding internal datasets
      1. Which ones and why?
        1. Customer information
        2. Production data
        3. Field services
        4. Financial
    2. Adding external datasets
      1. External datasets - geography
        1. Elevation
          1. SRTM elevation
          2. National Elevation Dataset (NED)
        2. Weather
        3. Geographical features
          1. Planet.osm
          2. Google Maps API
          3. USGS national transportation datasets
      2. External datasets - demographic
        1. The U.S. Census Bureau
        2. CIA World Factbook
      3. External datasets - economic
        1. Organization for Economic Cooperation and Development (OECD)
        2. Federal Reserve Economic Data (FRED)
    3. Summary
  9. Communicating with Others - Visualization and Dashboarding
    1. Common mistakes when designing visuals
    2. The Hierarchy of Questions method
      1. The Hierarchy of Questions method overview
        1. Developing question trees
      2. Pulling together the data
      3. Aligning views with question flows
    3. Designing visual analysis for IoT data
      1. Using layout positioning to convey importance
      2. Use color to highlight important data
        1. The impact of using a single color to communicate importance
        2. Be consistent across visuals
      3. Make charts easy to interpret
    4. Creating a dashboard with Tableau
      1. The dashboard walk-through
        1. Hierarchy of Questions example
        2. Aligning visuals to the thought process
        3. Creating individual views
        4. Assembling views into a dashboard
    5. Creating and visualizing alerts
      1. Alert principles
      2. Organizing alerts using a Tableau dashboard
    6. Summary
  10. Applying Geospatial Analytics to IoT Data
    1. Why do you need geospatial analytics for IoT?
    2. The basics of geospatial analysis
      1. Welcome to Null Island
      2. Coordinate Reference Systems
        1. The Earth is not a ball
    3. Vector-based methods
      1. The bounding box
      2. Contains
      3. Buffer
        1. Dilation and erosion
      4. Simplify
      5. Vector summary
    4. Raster-based methods
    5. Storing geospatial data
      1. File formats
      2. Spatial extensions for relational databases
      3. Storing geospatial data in HDFS
      4. Spatial indexing
        1. R-tree
    6. Processing geospatial data
      1. Geospatial analysis software
        1. ArcGIS
        2. QGIS
        3. ogr2ogr
      2. PostGIS spatial functions
      3. Geospatial analysis in the big data world
    7. Solving the pollution reporting problem
    8. Summary
  11. Data Science for IoT Analytics
    1. Machine learning (ML)
      1. What is machine learning?
        1. Representation
        2. Evaluation
        3. Optimization
      2. Generalization
      3. Feature engineering with IoT data
        1. Dealing with missing values
        2. Centering and scaling
        3. Time series handling
      4. Validation methods
        1. Cross-validation
        2. Test set
        3. Precision, recall, and specificity
      5. Understanding the bias–variance tradeoff
        1. Bias
        2. Variance
        3. Trade-off and complexity
      6. Comparing different models to find the best fit using R
        1. ROC curves
        2. Area Under the Curve (AUC)
      7. Random forest models using R
        1. Random forest key concepts
        2. Random forest R examples
      8. Gradient Boosting Machines (GBM) using R
        1. GBM key concepts
        2. The Gradient Boosting Machines R example
        3. Ensemble
    2. Anomaly detection using R
    3. Forecasting using ARIMA
      1. Using R to forecast time series IoT data
    4. Deep learning
      1. Use cases for deep learning with IoT data
      2. A Nickel Tour of deep learning
      3. Setting up TensorFlow on AWS
    5. Summary
  12. Strategies to Organize Data for Analytics
    1. Linked Analytical Datasets
      1. Analytical datasets
        1. Building analytic datasets
      2. Linking together datasets
    2. Managing data lakes
      1. When data lakes turn into data swamps
      2. Data refineries
      3. Developing a progression process
    3. The data retention strategy
      1. Goals
      2. Retention strategies for IoT data
        1. Reducing accessibility
        2. Reducing the number of fields
        3. Reduce the number of records
        4. The retention strategy example
    4. Summary
  13. The Economics of IoT Analytics
    1. The economics of cloud computing and open source
      1. Variable versus fixed costs
      2. The option to quit
      3. Cloud costs can escalate quickly
        1. Monitoring cloud billing closely
      4. Open source economics
        1. Intellectual property considerations
        2. Scale
        3. Support
    2. Cost considerations for IoT analytics
      1. Cloud services costs
      2. Expected usage considerations
    3. Thinking about revenue opportunities
    4. The economics of predictive maintenance example
      1. Situation
      2. The value formula
      3. An example of making a value decision
    5. Summary
  14. Bringing It All Together
    1. Review
      1. The IoT data flow
      2. IoT exploratory analytics
      3. IoT data science
      4. Building revenue from IoT analytics
    2. A sample project
    3. Summary