book

Big Data

Name: Big Data
ISBN: 9780128093467

by Rajkumar Buyya, Rodrigo N. Calheiros, Amir Vahid Dastjerdi

June 2016

Beginner to intermediate

494 pages

17h 52m

English

Morgan Kaufmann

Read now

Unlock full access

Cover image
Title page
Table of Contents
Copyright
List of contributors
About the Editors
Preface
Organization of the BookPart I: Big Data SciencePart II: Big Data Infrastructures and PlatformsPart III: Big Data Security and PrivacyPart IV: Big Data Applications
Acknowledgments
Part I: Big Data Science
Chapter 1: Big Data Analytics = Machine Learning + Cloud Computing
Abstract1.1 Introduction1.2 A Historical Review of Big Data1.3 Historical Interpretation of Big Data1.4 Defining Big Data From 3Vs to 32Vs1.5 Big Data Analytics and Machine Learning1.6 Big Data Analytics and Cloud Computing1.7 Hadoop, HDFS, MapReduce, Spark, and Flink1.8 ML + CC → BDA and Guidelines1.9 Conclusion

Chapter 2: Real-Time Analytics
Abstract2.1 Introduction2.2 Computing Abstractions for Real-Time Analytics2.3 Characteristics of Real-Time Systems2.4 Real-Time Processing for Big Data — Concepts and Platforms2.5 Data Stream Processing Platforms2.6 Data Stream Analytics Platforms2.7 Data Analysis and Analytic Techniques2.8 Finance Domain Requirements and a Case Study2.9 Future Research Challenges
Chapter 3: Big Data Analytics for Social Media
AbstractAcknowledgments3.1 Introduction3.2 NLP and Its Applications3.3 Text Mining3.4 Anomaly Detection
Chapter 4: Deep Learning and Its Parallelization
Abstract4.1 Introduction4.2 Concepts and Categories of Deep Learning4.3 Parallel Optimization for Deep Learning4.4 Discussions
Chapter 5: Characterization and Traversal of Large Real-World Networks
AbstractAcknowledgments5.1 Introduction5.2 Background5.3 Characterization and Measurement5.4 Efficient Complex Network Traversal5.5 k-Core-Based Partitioning for Heterogeneous Graph Processing5.6 Future Directions5.7 Conclusions
Part II: Big Data Infrastructures and Platforms
Chapter 6: Database Techniques for Big Data
Abstract6.1 Introduction6.2 Background6.3 NoSQL Movement6.4 NoSQL Solutions for Big Data Management6.5 NoSQL Data Models6.6 Future Directions6.7 Conclusions
Chapter 7: Resource Management in Big Data Processing Systems
Abstract7.1 Introduction7.2 Types of Resource Management7.3 Big Data Processing Systems and Platforms7.4 Single-Resource Management in the Cloud7.5 Multiresource Management in the Cloud7.6 Related Work on Resource Management7.7 Open Problems7.8 Summary
Chapter 8: Local Resource Consumption Shaping: A Case for MapReduce
Abstract8.1 Introduction8.2 Motivation8.3 Local Resource Shaper8.4 Evaluation8.5 Related Work8.6 ConclusionsAppendix CPU Utilization With Different Slot Configurations and LRS
Chapter 9: System Optimization for Big Data Processing
Abstract9.1 Introduction9.2 Basic Framework of the Hadoop Ecosystem9.3 Parallel Computation Framework: MapReduce9.4 Job Scheduling of Hadoop9.5 Performance Optimization of HDFS9.6 Performance Optimization of HBase9.7 Performance Enhancement of Hadoop System9.8 Conclusions and Future Directions
Chapter 10: Packing Algorithms for Big Data Replay on Multicore
Abstract10.1 Introduction10.2 Performance Bottlenecks10.3 The Big Data Replay Method10.4 Packing Algorithms10.5 Performance Analysis10.6 Summary and Future Directions
Part III: Big Data Security and Privacy
Chapter 11: Spatial Privacy Challenges in Social Networks
AbstractAcknowledgments11.1 Introduction11.2 Background11.3 Spatial Aspects of Social Networks11.4 Cloud-Based Big Data Infrastructure11.5 Spatial Privacy Case Studies11.6 Conclusions
Chapter 12: Security and Privacy in Big Data
Abstract12.1 Introduction12.2 Secure Queries Over Encrypted Big Data12.3 Other Big Data Security12.4 Privacy on Correlated Big Data12.5 Future Directions12.6 Conclusions
Chapter 13: Location Inferring in Internet of Things and Big Data
AbstractAcknowledgements13.1 Introduction13.2 Device-based Sensing Using Big Data13.3 Device-free Sensing Using Big Data13.4 Conclusion
Part IV: Big Data Applications
Chapter 14: A Framework for Mining Thai Public Opinions
AbstractAcknowledgments14.1 Introduction14.2 XDOM14.3 Implementation14.4 Validation14.5 Case Studies14.6 Summary and Conclusions
Chapter 15: A Case Study in Big Data Analytics: Exploring Twitter Sentiment Analysis and the Weather
AbstractAcknowledgments15.1 Background15.2 Big Data System Components15.3 Machine-Learning Methodology15.4 System Implementation15.5 Key Findings15.6 Summary and Conclusions
Chapter 16: Dynamic Uncertainty-Based Analytics for Caching Performance Improvements in Mobile Broadband Wireless Networks
Abstract16.1 Introduction16.2 Background16.3 Related Work16.4 VoD Architecture16.5 Overview16.6 Data Generation16.7 Edge and Core Components16.8 INCA Caching Algorithm16.9 QoE Estimation16.10 Theoretical Framework16.11 Experiments and Results16.12 Synthetic Dataset16.13 Conclusions and Future Directions
Chapter 17: Big Data Analytics on a Smart Grid: Mining PMU Data for Event and Anomaly Detection
AbstractAcknowledgments17.1 Introduction17.2 Smart Grid With PMUs and PDCs17.3 Improving Traditional Workflow17.4 Characterizing Normal Operation17.5 Identifying Unusual Phenomena17.6 Identifying Known Events17.7 Related Efforts17.8 Conclusion and Future Directions
Chapter 18: eScience and Big Data Workflows in Clouds: A Taxonomy and Survey
Abstract18.1 Introduction18.2 Background18.3 Taxonomy and Review of eScience Services in the Cloud18.4 Resource Provisioning for eScience Workflows in Clouds18.5 Open Problems18.6 Summary
Index

Content preview from Big Data

Chapter 10

Packing Algorithms for Big Data Replay on Multicore

M. Zhanikeev

Abstract

This chapter discusses optimization in a new environment created as an alternative to Hadoop/MapReduce. The core idea is to bring the bulk from now-passive shard nodes to a dedicated machine and replay it locally while a large number of jobs are running on multicore. This chapter discusses optimization methods for machines with a large number of cores and processing jobs. This chapter also discusses how the new architecture can easily accommodate advanced Big Data-related statistics, namely streaming algorithms.

Keywords

Packing algorithms; Big Data replay method; Massively multicore; Hadoop; MapReduce; Data streaming

10.1 Introduction

This chapter discusses ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9780128093467

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business