Designing application architectures for real-time decisions.
Big Data Tools and Pipelines
Ideas and resources related to data tools.
There are advantages to having a search engine built into a database.
Tools from maps to drones respond to crises with increasing speed and accuracy.
Transform your basemaps using CARTO and PostGIS.
How to use the wordcount example as a starting point (and you thought you’d escape the wordcount example).
Introducing the solar correlation map, and how to easily create your own.
Learn how to ship, parse, store, and analyze logs.
Performing business analytics on the data lake using next-gen open source tools.
Python and R are widely accepted as logical languages for data science—but what about Go?
Word embedding in natural language processing.
Apache Arrow makes it possible to use multiple languages and heterogeneous data infrastructure.
An analytics database can offer performance and scalability advantages.
Early methods to integrate machine learning using Naive Bayes and custom sinks.
Leading data-driven organizations point out five common pitfalls.
Assessing cost, performance, and run time of a typical Spark workload.
October 4-5, 2016, join Thomas Nield for a hands-on course for beginners on core database and SQL fundamentals.
Learn about the basics of how Hadoop works, why it's such an important technology, and how you should be using it without getting mired in the details.
Mark Grover and Ted Malaska offer an overview of projects for streaming applications, including Kafka, Flume, and Spark Streaming, and discuss the architectural schemas available, such as Lambda and Kappa.
You’ve got three options: Scaling up, scaling out, or using R as an abstraction layer.
Calvin Jia presents an in-depth overview of Alluxio and its role in the big data ecosystem. In this segment, he reviews examples that show how Alluxio complements Spark and S3, to enable fast data access.
Near-real-time processing yields increased efficiency and an opportunity for unified architecture.
Transparent matching of Spark portability with GPU performance.
Addressing the challenge of delivering big data analytics to the masses.
In this excerpt, Karthik Ramasamy and Sijie Guo of Twitter discuss the operational experience of DistributedLog and Heron, two powerful real-time analytics tools that were open sourced by the company in early 2016.