Book description
Many corporations are finding that the size of their data sets are outgrowing the capability of their systems to store and process them. The data is becoming too big to manage and use with traditional tools. The solution: implementing a big data system.
As Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset shows, Apache Hadoop offers a scalable, fault-tolerant system for storing and processing data in parallel. It has a very rich toolset that allows for storage (Hadoop), configuration (YARN and ZooKeeper), collection (Nutch and Solr), processing (Storm, Pig, and Map Reduce), scheduling (Oozie), moving (Sqoop and Avro), monitoring (Chukwa, Ambari, and Hue), testing (Big Top), and analysis (Hive).
The problem is that the Internet offers IT pros wading into big data many versions of the truth and some outright falsehoods born of ignorance. What is needed is a book just like this one: a wide-ranging but easily understood set of instructions to explain where to get Hadoop tools, what they can do, how to install them, how to configure them, how to integrate them, and how to use them successfully. And you need an expert who has worked in this area for a decade—someone just like author and big data expert Mike Frampton.
Big Data Made Easy approaches the problem of managing massive data sets from a systems perspective, and it explains the roles for each project (like architect and tester, for example) and shows how the Hadoop toolset can be used at each system stage. It explains, in an easily understood manner and through numerous examples, how to use each tool. The book also explains the sliding scale of tools available depending upon data size and when and how to use them. Big Data Made Easy shows developers and architects, as well as testers and project managers, how to:
Big Data Made Easy also explains the best part, which is that this toolset is free. Anyone can download it and—with the help of this book—start to use it within a day. With the skills this book will teach you under your belt, you will add value to your company or client immediately, not to mention your career.
Table of contents
- Cover
- Title
- Copyright
- Dedication
- Contents at a Glance
- Contents
- About the Author
- About the Technical Reviewer
- Acknowledgments
- Introduction
- Chapter 1: The Problem with Data
- Chapter 2: Storing and Configuring Data with Hadoop, YARN, and ZooKeeper
- Chapter 3: Collecting Data with Nutch and Solr
- Chapter 4: Processing Data with Map Reduce
- Chapter 5: Scheduling and Workflow
- Chapter 6: Moving Data
- Chapter 7: Monitoring Data
- Chapter 8: Cluster Management
- Chapter 9: Analytics with Hadoop
- Chapter 10: ETL with Hadoop
- Chapter 11: Reporting with Hadoop
- Index
Product information
- Title: Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset
- Author(s):
- Release date: December 2014
- Publisher(s): Apress
- ISBN: 9781484200940
You might also like
book
Hadoop MapReduce v2 Cookbook - Second Edition
Explore the Hadoop MapReduce v2 ecosystem to gain insights from very large datasets In Detail Starting …
book
Hadoop: Data Processing and Modelling
Unlock the power of your data with Hadoop 2.X ecosystem and its data warehousing techniques across …
book
Next-Generation Big Data: A Practical Guide to Apache Kudu, Impala, and Spark
Utilize this practical and easy-to-follow guide to modernize traditional enterprise data warehouse and business intelligence environments …
book
Hadoop Security
As more corporations turn to Hadoop to store and process their most valuable data, the risk …