Book description
Build, process and analyze largescale graph data effectively with Spark
About This Book
 Find solutions for every stage of data processing from loading and transforming graph data to
 Improve the scalability of your graphs with a variety of realworld applications with complete Scala code.
 A concise guide to processing largescale networks with Apache Spark.
Who This Book Is For
This book is for data scientists and big data developers who want to learn the processing and analyzing graph datasets at scale. Basic programming experience with Scala is assumed. Basic knowledge of Spark is assumed.
What You Will Learn
 Write, build and deploy Spark applications with the Scala Build Tool.
 Build and analyze largescale network datasets
 Analyze and transform graphs using RDD and graphspecific operations
 Implement new custom graph operations tailored to specific needs.
 Develop iterative and efficient graph algorithms using message aggregation and Pregel abstraction
 Extract subgraphs and use it to discover common clusters
 Analyze graph data and solve various data science problems using realworld datasets.
In Detail
Apache Spark is the next standard of opensource clustercomputing engine for processing big data. Many practical computing problems concern large graphs, like the Web graph and various social networks. The scale of these graphs  in some cases billions of vertices, trillions of edges  poses challenges to their efficient processing. Apache Spark GraphX API combines the advantages of both dataparallel and graphparallel systems by efficiently expressing graph computation within the Spark dataparallel framework.
This book will teach the user to do graphical programming in Apache Spark, apart from an explanation of the entire process of graphical data analysis. You will journey through the creation of graphs, its uses, its exploration and analysis and finally will also cover the conversion of graph elements into graph structures.
This book begins with an introduction of the Spark system, its libraries and the Scala Build Tool. Using a handson approach, this book will quickly teach you how to install and leverage Spark interactively on the command line and in a standalone Scala program. Then, it presents all the methods for building Spark graphs using illustrative network datasets. Next, it will walk you through the process of exploring, visualizing and analyzing different network characteristics. This book will also teach you how to transform raw datasets into a usable form. In addition, you will learn powerful operations that can be used to transform graph elements and graph structures. Furthermore, this book also teaches how to create custom graph operations that are tailored for specific needs with efficiency in mind. The later chapters of this book cover more advanced topics such as clustering graphs, implementing graphparallel iterative algorithms and learning methods from graph data.
Style and approach
A stepbystep guide that will walk you through the key ideas and techniques for processing big graph data at scale, with practical examples that will ensure an overall understanding of the concepts of Spark.
Publisher resources
Table of contents

Apache Spark Graph Processing
 Table of Contents
 Apache Spark Graph Processing
 Credits
 Foreword
 About the Author
 About the Reviewer
 www.PacktPub.com
 Preface
 1. Getting Started with Spark and GraphX
 2. Building and Exploring Graphs
 3. Graph Analysis and Visualization
 4. Transforming and Shaping Up Graphs to Your Needs
 5. Creating Custom Graph Aggregation Operators
 6. Iterative GraphParallel Processing with Pregel
 7. Learning Graph Structures
 A. References
 Index
Product information
 Title: Apache Spark Graph Processing
 Author(s):
 Release date: September 2015
 Publisher(s): Packt Publishing
 ISBN: 9781784391805
You might also like
book
40 Algorithms Every Programmer Should Know
Learn algorithms for solving classic computer science problems with this concise guide covering everything from fundamental …
book
Software Engineering at Google
Today, software engineers need to know not only how to program effectively but also how to …
book
Spark GraphX in Action
Summary Spark GraphX in Action starts out with an overview of Apache Spark and the GraphX …
book
Visualizing Graph Data
Summary Visualizing Graph Data teaches you not only how to build graph data structures, but also …