Book description
This IBM® Redbooks® publication provides topics to help the technical community take advantage of the resilience, scalability, and performance of the IBM Power Systems™ platform to implement or integrate an IBM Data Engine for Hadoop and Spark solution for analytics solutions to access, manage, and analyze data sets to improve business outcomes.
This book documents topics to demonstrate and take advantage of the analytics strengths of the IBM POWER8® platform, the IBM analytics software portfolio, and selected third-party tools to help solve customer's data analytic workload requirements. This book describes how to plan, prepare, install, integrate, manage, and show how to use the IBM Data Engine for Hadoop and Spark solution to run analytic workloads on IBM POWER8. In addition, this publication delivers documentation to complement available IBM analytics solutions to help your data analytic needs.
This publication strengthens the position of IBM analytics and big data solutions with a well-defined and documented deployment model within an IBM POWER8 virtualized environment so that customers have a planned foundation for security, scaling, capacity, resilience, and optimization for analytics workloads.
This book is targeted at technical professionals (analytics consultants, technical support staff, IT Architects, and IT Specialists) that are responsible for delivering analytics solutions and support on IBM Power Systems.
Table of contents
- Front cover
- Notices
- IBM Redbooks promotions
- Preface
- Chapter 1. Introduction to IBM Data Engine for Hadoop and Spark
- Chapter 2. Solution reference architecture
-
Chapter 3. Use case scenario for the IBM Data Engine for Hadoop and Spark
- 3.1 When to use IBM Data Engine for Hadoop and Spark
- 3.2 When to use Hadoop and what workloads are suitable for it
- 3.3 When to use Apache Spark and what workloads are suitable for it
- 3.4 Greater resource utilization by using IBM Spectrum Symphony
- 3.5 Comparing Hadoop Distributed File System and IBM Spectrum Scale
- 3.6 Using the analytic capabilities of IBM Open Platform
-
Chapter 4. Operational guidelines
- 4.1 Introduction
-
4.2 Adding a compute node
- 4.2.1 Identifying the networks
- 4.2.2 Defining the Central Electronics Complex group
- 4.2.3 Updating the server firmware
- 4.2.4 Installing the base operating system
- 4.2.5 Configuring the host name, users, and groups
- 4.2.6 Installing and configuring IBM Spectrum Scale
- 4.2.7 Installing software with Ambari
- 4.3 Configuring the Apache Spark UI
- 4.4 Deployment and operation tools
-
Chapter 5. Multitenancy
- 5.1 Introduction to multitenancy
- 5.2 IBM Spectrum Computing resource manager
-
5.3 Configuring multitenancy for MapReduce workloads
- 5.3.1 Monitoring MapReduce jobs by using IBM Spectrum Symphony
- 5.3.2 Creating an application profile
- 5.3.3 Adding users or groups to an existing application profile
- 5.3.4 Configuring the share ratio between application profiles
- 5.3.5 Configuring slot mapping
- 5.3.6 Configuring the priority for running jobs
- Appendix A. Ordering the solution
- Appendix B. Script to clone partitions
- Related publications
- Back cover
Product information
- Title: IBM Data Engine for Hadoop and Spark
- Author(s):
- Release date: August 2016
- Publisher(s): IBM Redbooks
- ISBN: 9780738441931
You might also like
book
Introducing .NET for Apache Spark: Distributed Processing for Massive Datasets
Get started using Apache Spark via C# or F# and the .NET for Apache Spark bindings. …
article
Run Llama-2 Models Locally with llama.cpp
Llama is Meta’s answer to the growing demand for LLMs. Unlike its well-known technological relative, ChatGPT, …
book
Hadoop: Data Processing and Modelling
Unlock the power of your data with Hadoop 2.X ecosystem and its data warehousing techniques across …
book
Hadoop 2 Quick-Start Guide: Learn the Essentials of Big Data Computing in the Apache Hadoop 2 Ecosystem
Get Started Fast with Apache Hadoop ® 2, YARN, and Today’s Hadoop Ecosystem With Hadoop 2.x …