Preface
If you’re reading this book, you already know that there have been dramatic shifts in the data management landscape in recent years. We’ve seen a shift from third-party, proprietary solutions to new, open source distributed data systems. Of course, the common term used to refer to these newer solutions is “big data” (a term we find to be less and less useful), but it’s important to note that many of the earlier proprietary systems utilize distributed architectures that can store and process large volumes of data. Although we can apply these proprietary solutions and the newer open source solutions to solve many of the same problems, there are some distinct differences that have contributed to the growth of the newer systems. This includes not just the economies of the open source approach, but also technology approaches that facilitate the implementation of many applications that are challenging with previous solutions.
Along with the growth of these systems, we’ve seen a corresponding growth in books, articles, training, conferences, and so on dedicated to help you, the practitioner, use these systems, so it’s reasonable to ask why yet another book on this “big data” stuff? To quote a cliché, we think the answer is that it becomes easy to miss the forest for the trees. Most of these materials focus on low-level details such as implementing applications using distributed processing engines like MapReduce or Spark or applying advanced algorithms to perform data analysis. ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access