The Path to Predictive Analytics and Machine Learning
by Conor Doherty, Steven Camina, Kevin White, Gary Orenstein
Introduction
An Anthropological Perspective
If you believe that as a species, communication advanced our evolution and position, let us take a quick look from cave paintings, to scrolls, to the printing press, to the modern day data storage industry.
Marked by the invention of disk drives in the 1950s, data storage advanced information sharing broadly. We could now record, copy, and share bits of information digitally. From there emerged superior CPUs, more powerful networks, the Internet, and a dizzying array of connected devices.
Today, every piece of digital technology is constantly sharing, processing, analyzing, discovering, and propagating an endless stream of zeros and ones. This web of devices tells us more about ourselves and each other than ever before.
Of course, to meet these information sharing developments, we need tools across the board to help. Faster devices, faster networks, faster central processing, and software to help us discover and harness new opportunities.
Often, it will be fine to wait an hour, a day, even sometimes a week, for the information that enriches our digital lives. But more frequently, it’s becoming imperative to operate in the now.
In late 2014, we saw emerging interest and adoption of multiple in-memory, distributed architectures to build real-time data pipelines. In particular, the adoption of a message queue like Kafka, transformation engines like Spark, and persistent databases like MemSQL opened up a new world of capabilities for fast ...