Big Data Architect's Handbook

Book description

A comprehensive end-to-end guide that gives hands-on practice in big data and Artificial Intelligence

About This Book
  • Learn to build and run a big data application with sample code
  • Explore examples to implement activities that a big data architect performs
  • Use Machine Learning and AI for structured and unstructured data
Who This Book Is For

Big Data Architect's Handbook is for you if you are an aspiring data professional, developer, or IT enthusiast who aims to be an all-round architect in big data. This book is your one-stop solution to enhance your knowledge and carry out easy to complex activities required to become a big data architect.

What You Will Learn
  • Learn Hadoop Ecosystem and Apache projects
  • Understand, compare NoSQL database and essential software architecture
  • Cloud infrastructure design considerations for big data
  • Explore application scenario of big data tools for daily activities
  • Learn to analyze and visualize results to uncover valuable insights
  • Build and run a big data application with sample code from end to end
  • Apply Machine Learning and AI to perform big data intelligence
  • Practice the daily activities performed by big data architects
In Detail

The big data architects are the “masters” of data, and hold high value in today's market. Handling big data, be it of good or bad quality, is not an easy task. The prime job for any big data architect is to build an end-to-end big data solution that integrates data from different sources and analyzes it to find useful, hidden insights.

Big Data Architect's Handbook takes you through developing a complete, end-to-end big data pipeline, which will lay the foundation for you and provide the necessary knowledge required to be an architect in big data. Right from understanding the design considerations to implementing a solid, efficient, and scalable data pipeline, this book walks you through all the essential aspects of big data. It also gives you an overview of how you can leverage the power of various big data tools such as Apache Hadoop and ElasticSearch in order to bring them together and build an efficient big data solution.

By the end of this book, you will be able to build your own design system which integrates, maintains, visualizes, and monitors your data. In addition, you will have a smooth design flow in each process, putting insights in action.

Style and approach

Comprehensive guide with a perfect blend of theory, examples and implementation of real-world use-cases

Publisher resources

Download Example Code

Table of contents

  1. Title Page
  2. Copyright and Credits
    1. Big Data Architect's Handbook
  3. Packt Upsell
    1. Why subscribe?
    2. PacktPub.com
  4. Contributors
    1. About the author
    2. About the reviewers
    3. Packt is searching for authors like you
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
    4. Download the example code files
    5. Download the color images
    6. Conventions used
    7. Get in touch
    8. Reviews
  6. Why Big Data?
    1. What is big data?
    2. Characteristics of big data
    3. Volume
    4. Velocity
    5. Variety
    6. Veracity
    7. Variability
    8. Value
    9. Solution-based approach for data
    10. Data – the most valuable asset
    11. Traditional approaches to data storage
    12. Clustered computing
    13. High availability
    14. Resource pooling
    15. Easy scalability
    16. Big data – how does it make a difference?
    17. Big data solutions – cloud versus on-premises infrastructure
    18. Cost
    19. Security
    20. Current capabilities
    21. Scalability
    22. Big data glossary
    23. Big data
    24. Batch processing
    25. Cluster computing
    26. Data warehouse
    27. Data lake
    28. Data mining
    29. ETL
    30. Hadoop
    31. In-memory computing
    32. Machine learning
    33. MapReduce
    34. NoSQL
    35. Stream processing
    36. Summary
  7. Big Data Environment Setup
    1. Oracle VM VirtualBox installation
    2. Ubuntu installation
    3. Hadoop prerequisite installation
    4. Java installation
    5. SSH installation and configuration
    6. Hadoop system user
    7. Apache Hadoop installation
    8. Hadoop configuration
    9. Path configuration for Hadoop commands
    10. Hadoop server start and stop
    11. Summary
  8. Hadoop Ecosystem
    1. Apache Hadoop
    2. Hadoop Distributed File System
    3. HDFS hands-on
    4. Creating a directory in HDFS
    5. Copying files from a local file system to HDFS
    6. Copying files from HDFS to a local file system
    7. Deleting files and folders in HDFS
    8. Hadoop MapReduce
    9. Job Tracker and Task Tracker
    10. The execution flow of MapReduce 
    11. Mapper
    12. Shuffle and Sort
    13. Reducer
    14. Example program
    15. Preparing the data file for analysis
    16. Program code
    17. Driver program
    18. Mapper program
    19. Reducer program
    20. Observations and results
    21. YARN
    22. Resource Manager
    23. Node Manager
    24. Container
    25. Application Master
    26. Apache Projects related to big data
    27. Apache Zookeeper
    28. Apache Kafka
    29. Apache Flume
    30. Apache Cassandra
    31. Apache HBase
    32. Apache Spark
    33. Summary
  9. NoSQL Database
    1. What is NoSQL?
    2. Benefits of NoSQL databases
    3. NoSQL versus RDBMS
    4. The CAP theorem
    5. The ACID properties
    6. Data models in NoSQL
    7. Key-value data stores
    8. Document store
    9. Column stores
    10. Graph stores
    11. Apache Cassandra
    12. Installation
    13. Starting Cassandra
    14. The Cassandra Query Language – CQL
    15. The help command
    16. Basic commands
    17. Data manipulation
    18. Creating, altering, and deleting a keyspace
    19. Creating, altering, and deleting tables
    20. Inserting, updating, and deleting data
    21. The MongoDB database
    22. Installing MongoDB
    23. Starting MongoDB
    24. Working on MongoDB
    25. The help command
    26. Basic commands
    27. Data manipulation
    28. Creating and deleting databases
    29. Creating and deleting collections
    30. The c<span class="_Tgc _y9e">reate, retrieve, update, delete operations
    31. Neo4j database
    32. Installing Neo4j
    33. Starting Neo4j
    34. The cypher query language
    35. Help
    36. Basic operations in Cypher
    37. Creating nodes, relationships, and properties
    38. Updating nodes, relationships, and properties
    39. Deleting nodes, relationships, and properties
    40. Reading nodes, relationships, and properties
    41. Summary
  10. Off-the-Shelf Commercial Tools
    1. Microsoft Azure
    2. Building a practical application
    3. Microsoft Azure account
    4. The Azure Event Hub
    5. IoT simulation application
    6. Setting up an&#160;Azure Stream Analytics job
    7. Input
    8. Query&#160;
    9. Output
    10. Dashboard in Power BI
    11. Summary
  11. Containerization
    1. Virtualization
    2. Hypervisors
    3. Hardware-based hypervisors
    4. Software-based hypervisors
    5. What is containerization?
    6. Benefits of containers
    7. Docker
    8. Docker workflow
    9. Installation
    10. Basic commands
    11. Docker images
    12. Building a Docker image
    13. Running and verifying Docker images
    14. Importing and exporting Docker images
    15. Docker Swarm
    16. Setting up Docker Swarm
    17. Creating service containers
    18. Replicating containers
    19. Removing container services
    20. Kubernetes
    21. Key components
    22. Pods
    23. ReplicaSets
    24. Deployments
    25. PetSets
    26. Installation
    27. Deployment
    28. Kubernetes Dashboard
    29. Summary
  12. Network Infrastructure
    1. Network
    2. Local area networks
    3. Metropolitan area networks
    4. Wide area networks
    5. Network connectivity
    6. Wired
    7. Wireless
    8. Network visualization
    9. Gephi
    10. Installation
    11. Java installation
    12. First run
    13. Practical example
    14. Summary
  13. Cloud Infrastructure
    1. Companies moving to cloud&#160;
    2. Driving factors
    3. Infrastructure
    4. Locality of data
    5. Requirements
    6. Design considerations
    7. Open source versus commercial
    8. Commodity hardware versus purpose build
    9. Cloud versus on-premises
    10. Scale up and down
    11. Application architecture
    12. Cost decision
    13. Summary
  14. Security and Monitoring
    1. Simple Network Management Protocol
    2. Benefits of SNMP
    3. Security
    4. Agents and Traps
    5. Netflow
    6. Nagios
    7. Key benefits
    8. Security Onion
    9. Deployment scenarios
    10. The Standalone model
    11. The Server-Sensor model
    12. Hybrid model
    13. Preconfigured tools
    14. Wireshark
    15. Key features
    16. Summary
  15. Frontend Architecture
    1. React JS
    2. Key concepts&#160;
    3. Node.js
    4. JSX
    5. Unidirectional dataflow
    6. Getting started with ReactJS
    7. Single page application
    8. React application project
    9. React app directory structure
    10. Components
    11. Properties
    12. Event handling
    13. State
    14. Redux
    15. Architecture of Redux
    16. Key concepts
    17. Single store
    18. Action
    19. Reducers
    20. Guestbook application
    21. Installation
    22. Create a store
    23. Setting up Reducer
    24. Setting up Dispatcher
    25. Connect function
    26. Setting up Subscribers
    27. Final output
    28. Summary
  16. Backend Architecture
    1. API
    2. RESTful API
    3. HTTP request methods
    4. GET
    5. POST
    6. PUT
    7. DELETE
    8. Authentication
    9. Basic authentication
    10. JSON Web Token
    11. Header
    12. Payload
    13. Signature
    14. Practical
    15. RESTful web service
    16. Java client
    17. Redis
    18. Installation
    19. Redis server
    20. Redis client
    21. Working with Redis
    22. Redis data types and structures
    23. String
    24. HashMap
    25. List
    26. Set
    27. Redis Publish/Subscribe
    28. Common key operations
    29. Summary
  17. Machine Learning
    1. Machine learning
    2. Types of algorithms
    3. Parametric algorithms
    4. Non-parametric algorithms
    5. Supervised learning
    6. The classification model
    7. Binary classification&#160;
    8. Multi-class classification
    9. The regression model
    10. Linear regression
    11. Polynomial regression
    12. Unsupervised learning
    13. Clustering, k-means
    14. Neural networks
    15. Feedforward neural network
    16. Recurrent neural network
    17. Symmetrically connected neural network
    18. Deep neural networks
    19. Decision tree classifiers
    20. Summary
  18. Artificial Intelligence
    1. Artificial intelligence
    2. Convolutional neural networks
    3. Deep learning using TensorFlow
    4. TensorFlow
    5. Installation
    6. TensorFlow program
    7. Uninstalling TensorFlow
    8. TensorBoard
    9. Program
    10. Launching TensorBoard
    11. TensorBoard graph
    12. Object detection using YOLO
    13. Installation
    14. Compiling YOLO library
    15. Trained weights
    16. Detecting objects in an image
    17. Summary
  19. Elasticsearch
    1. Installing Elasticsearch
    2. Starting the Elasticsearch server
    3. Auto starting the Elasticsearch service
    4. Stopping the Elasticsearch server
    5. Uninstalling Elasticsearch
    6. Kibana
    7. Installation
    8. Starting Kibana
    9. Uninstalling Kibana
    10. Security
    11. Securing Elasticsearch
    12. Securing Kibana
    13. Understanding queries – CRUD commands
    14. Creating
    15. Reading
    16. Updating
    17. Deleting
    18. Summary
  20. Structured Data
    1. Data analysis
    2. Installing MySQL
    3. Importing data
    4. Analyzing the data model
    5. HBase
    6. Installation
    7. Starting an HBase instance
    8. Stopping a HBase instance
    9. Preparing an HBase for migration
    10. Sqoop
    11. Installation
    12. Verifying the installation
    13. MySQL JDBC driver
    14. Importing data
    15. Verifying the imported data
    16. Summary
  21. Unstructured Data
    1. Moving data into Hadoop
    2. Downloading Flume
    3. Environment configuration
    4. Configuring agent and sink
    5. Running Apache Flume
    6. Transferring a log file
    7. Converting images into text for analysis
    8. Tesseract OCR
    9. Installing Tesseract
    10. Practical example
    11. Complete code
    12. Program execution
    13. Summary
  22. Data Visualization
    1. Matplotlib
    2. Installing Matplotlib
    3. Line chart
    4. Bar charts
    5. Stack charts
    6. Scatter charts
    7. Pie charts
    8. Geographic projections
    9. D3.js
    10. Installation
    11. Practical example
    12. Output
    13. Summary
  23. Financial Trading System
    1. What is algorithmic trading?
    2. Benefits of algorithmic trading
    3. Big data in the financial market
    4. Algorithmic trading strategies
    5. Building an Expert Advisor
    6. MetaTrader
    7. Downloading and setting up MetaTrader
    8. MetaQuotes language
    9. Trading bot objective
    10. Practical
    11. Trading pattern&#160;– moving average
    12. Decision time: buy or sell
    13. Complete program
    14. Backtesting in MetaTrader 4
    15. Summary
  24. Retail Recommendation System
    1. Types of recommendation system
    2. Collaborative filtering
    3. Content-based filtering
    4. Demographic-based system
    5. Utility-based system
    6. Knowledge-based system
    7. Hybrid model
    8. Commercial tools
    9. Barilliance
    10. Softcube
    11. Strands
    12. Monetate
    13. Nosto
    14. Book recommendation system
    15. Dataset
    16. Directory structure
    17. Code
    18. Reading the dataset
    19. Verifying the dataset
    20. Data analysis
    21. Age group
    22. Commutative rating
    23. Algorithms
    24. Top-rated books
    25. Popular books
    26. Demographic-based recommendation
    27. Useful resources
    28. Summary
  25. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think

Product information

  • Title: Big Data Architect's Handbook
  • Author(s): Syed Muhammad Fahad Akhtar
  • Release date: June 2018
  • Publisher(s): Packt Publishing
  • ISBN: 9781788835824