Chapter 2. Tools and Architecture for Big Data
In this chapter, Evan Chan performs a storage and query cost-analysis on various analytics applications, and describes how Apache Cassandra stacks up in terms of ad hoc, batch, and time-series analysis. Next, Federico Castanedo discusses how using distributed frameworks to scale R can help solve the problem of storing large and ever-growing data sets in RAM. Daniel Whitenack then explains how a new programming language from Google—Go—could help data science teams overcome common obstacles such as integrating data science in an engineering organization. Whitenack also details the many tools, packages, and resources that allow users to perform data cleansing, visualization, and even machine learning in Go. Finally, Nicolas Seyvet and Ignacio Mulas Viela describe how the telecom industry is navigating the current data analytics environment. In their use case, they apply both Kappa architecture and a Bayesian anomaly detection model to a high-volume data stream originating from a cloud monitoring system.
Apache Cassandra for Analytics: A Performance and Storage Analysis
You can read this post on oreilly.com here.
This post is about using Apache Cassandra for analytics. Think time series, IoT, data warehousing, writing, and querying large swaths of data—not so much transactions or shopping carts. Users thinking of Cassandra as an event store and source/sink for machine learning/modeling/classification would also benefit greatly ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access