book

Principles of Big Data

Name: Principles of Big Data
Author: Jules J. Berman
ISBN: 9780124047242

by Jules J. Berman

May 2013

Intermediate to advanced

288 pages

11h 51m

English

Morgan Kaufmann

Read now

Unlock full access

Cover image
Title page
Table of Contents
Copyright
Dedication
Acknowledgments
Author Biography
Preface
Introduction
Definition of Big DataBig Data Versus Small DataWhence Comest Big Data?The Most Common Purpose of Big Data is to Produce Small DataOpportunitiesBig Data Moves to the Center of the Information Universe
Chapter 1. Providing Structure to Unstructured Data
BackgroundMachine TranslationAutocodingIndexingTerm ExtractionReferences

Chapter 2. Identification, Deidentification, and Reidentification
BackgroundFeatures of an Identifier SystemRegistered Unique Object IdentifiersReally Bad Identifier MethodsEmbedding Information in an Identifier: Not RecommendedOne-Way HashesUse Case: Hospital RegistrationDeidentificationData ScrubbingReidentificationLessons LearnedReferences
Chapter 3. Ontologies and Semantics
BackgroundClassifications, the Simplest of OntologiesOntologies, Classes with Multiple ParentsChoosing a Class ModelIntroduction to Resource Description Framework SchemaCommon Pitfalls in Ontology DevelopmentReferences
Chapter 4. Introspection
BackgroundKnowledge of SelfeXtensible Markup LanguageIntroduction to MeaningNamespaces and the Aggregation of Meaningful AssertionsResource Description Framework TriplesReflectionUse Case: Trusted Time StampSummaryReferences
Chapter 5. Data Integration and Software Interoperability
BackgroundThe Committee to Survey StandardsStandard TrajectorySpecifications and StandardsVersioningCompliance IssuesInterfaces to Big Data ResourcesReferences
Chapter 6. Immutability and Immortality
BackgroundImmutability and IdentifiersData ObjectsLegacy DataData Born from DataReconciling Identifiers across InstitutionsZero-Knowledge ReconciliationThe Curator’s BurdenReferences
Chapter 7. Measurement
BackgroundCountingGene CountingDealing with NegationsUnderstanding Your ControlPractical Significance of MeasurementsObsessive-Compulsive Disorder: The Mark of a Great Data ManagerReferences
Chapter 8. Simple but Powerful Big Data Techniques
BackgroundLook At the DataData RangeDenominatorFrequency DistributionsMean and Standard DeviationEstimation-Only AnalysesUse Case: Watching Data Trends with Google NgramsUse Case: Estimating Movie PreferencesReferences
Chapter 9. Analysis
BackgroundAnalytic TasksClustering, Classifying, Recommending, and ModelingData ReductionNormalizing and Adjusting DataBig Data Software: Speed and ScalabilityFind Relationships, Not SimilaritiesReferences
Chapter 10. Special Considerations in Big Data Analysis
BackgroundTheory in Search of DataData in Search of a TheoryOverfittingBigness BiasToo Much DataFixing DataData Subsets in Big Data: Neither Additive nor TransitiveAdditional Big Data PitfallsReferences
Chapter 11. Stepwise Approach to Big Data Analysis
BackgroundStep 1. A Question Is FormulatedStep 2. Resource EvaluationStep 3. A Question Is ReformulatedStep 4. Query Output AdequacyStep 5. Data DescriptionStep 6. Data ReductionStep 7. Algorithms Are Selected, If Absolutely NecessaryStep 8. Results Are Reviewed and Conclusions Are AssertedStep 9. Conclusions Are Examined and Subjected to ValidationReferences
Chapter 12. Failure
BackgroundFailure Is CommonFailed StandardsComplexityWhen Does Complexity Help?When Redundancy FailsSave Money; Don’t Protect Harmless InformationAfter FailureUse Case: Cancer Biomedical Informatics Grid, a Bridge too FarReferences
Chapter 13. Legalities
BackgroundResponsibility for the Accuracy and Legitimacy of Contained DataRights to Create, Use, and Share the ResourceCopyright and Patent Infringements Incurred by Using StandardsProtections for IndividualsConsentUnconsented DataGood Policies Are a Good PolicyUse Case: The Havasupai StoryReferences
Chapter 14. Societal Issues
BackgroundHow Big Data Is PerceivedThe Necessity of Data Sharing, Even When It Seems IrrelevantReducing Costs and Increasing Productivity with Big DataPublic MistrustSaving Us from OurselvesHubris and HyperboleReferences
Chapter 15. The Future
BackgroundLast WordsReferences
Glossary
References
Index

Overview

Principles of Big Data helps readers avoid the common mistakes that endanger all Big Data projects. By stressing simple, fundamental concepts, this book teaches readers how to organize large volumes of complex data, and how to achieve data permanence when the content of the data is constantly changing. General methods for data verification and validation, as specifically applied to Big Data resources, are stressed throughout the book. The book demonstrates how adept analysts can find relationships among data objects held in disparate Big Data resources, when the data objects are endowed with semantic support (i.e., organized in classes of uniquely identified data objects). Readers will learn how their data can be integrated with data from other resources, and how the data extracted from Big Data resources can be used for purposes beyond those imagined by the data creators.

Learn general methods for specifying Big Data in a way that is understandable to humans and to computers
Avoid the pitfalls in Big Data design and analysis
Understand how to create and use Big Data safely and responsibly with a set of laws, regulations and ethical standards that apply to the acquisition, distribution and integration of Big Data resources

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9780124045767

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills