Configuring Your First Big Data Environment


  • Getting Started
  • Finding the Installation Tools
  • Running the Installation
  • Validating Your New Cluster
  • Learning Post Install Tasks

In this chapter, you learn the steps necessary to get Hortonworks Data Platform (HDP) and HDInsight Service installed and configured for your use. You'll first walk through the install of HDP on a local Windows Server. Next you'll walk through installing HDInsight on Windows Azure. You'll then follow up on some basic steps on verifying your installs by analyzing log files. Finally, you'll load some data into the Hadoop Distributed File System (HDFS) and run some queries against it using Hive and Pig. This chapter will introduce you to and prepare you for many of the big data features you will be using throughout the rest of the book.

Getting Started

This chapter covers two common scenarios: a single-node Hadoop cluster for simple testing and kicking-the-tires Hadoop; and then we configure an HDInsight cluster in Windows Azure with four nodes to understand the vagaries of the cloud environment. This chapter assumes that the on-premise cluster is being built in a Hyper-V environment or other similar virtualization technology for your initial development environment. (Later in this book, Chapter 16, “Operational Big Data Management,” shows what an enterprise class cluster may ...

Get Microsoft Big Data Solutions now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.