Years ago, the forward thinkers in software engineering predicted a time when building applications would become less complex. It’s safe to say that future is still a distant mirage—if anything, complexity has only increased. Where we used to build applications for more efficient business processes, organizations now rely on software just to stay in business. Terms like “digital transformation” and “customer experience” drive the need for more data-driven applications simply to stay competitive and relevant.
The explosion of tools and infrastructure products hasn’t made it easier for builders trying to make it work. However, a cluster of products has risen to be the dominant back-end stack when building data-driven applications. Thankfully, this stack has an easy acronym to remember: SMACK.
The SMACK name represents the individual parts of the collection: Spark, Mesos, Akka, Cassandra, and Kafka. Each has a separate job that is unique from the others, but in combination, give you a well-rounded back-end infrastructure that holds up to today’s most demanding workloads. Each component is built on distributed system methodologies and each scales horizontally when needed. To tackle large-scale data processing, you need to break down the responsibilities into three main areas: collect, process, and store.
Collecting data at high speed is enough of a challenge, but you need order in the potentially chaotic data stream. Apache Kafka was purpose-built to decouple data pipelines and organize streams of data. Kafka guarantees that data from producers is seen at least once, in the order it was received, as a back-end consumer process. Using topic-based queues, you can further organize your data as it is collected.
It wasn’t long ago when just batch processing your large volumes of data was enough analytics. Today’s competitive landscape requires that you use your data immediately, or fall behind. It requires a combination of processing techniques, such as discrete data points using Akka, stream micro-batch with Spark Streaming, and large-scale batch with Apache Spark. This makes up the processing layer we use at DataStax, with near real-time data processing in order to create immediate context and batch jobs to combine and enrich historical data.
To keep up with the volumes and velocity of data, you need a database designed to scale when you need it: that is where we rely on Apache Cassandra for storage. Conceived as a cloud-first database, it was designed around the workloads required with data-rich applications in mind. Just like the other parts of the SMACK stack, Cassandra scales by simply adding more nodes, which means it’s ready when you need it. More importantly, the always-on, multi-datacenter (and multi-cloud) architecture means that your most important asset—data—is protected. The tight integration with Apache Spark and Akka combine processing and storage for new and old data.
Finally, with all that horizontally scaling infrastructure, you need to have it under control. That’s where Apache Mesos steps in to save the day. With the rapidly changing workloads, resource contention can be unmanageable without the right tool. Originally designed to manage the dynamic workloads of Apache Spark, Mesos has evolved into a resource management tool for all your data infrastructure. Beyond the ability to deploy and manage services, you can also control CPU, memory, and disk allocation per system. These capabilities allow potentially conflicting workloads to act in isolation from each other, ensuring more efficient use of infrastructure. Apache Mesos is also horizontally scaling and resilient to the same type of failures as the rest of the SMACK stack.
You have a lot of choices in data infrastructure, and many of them are confusing or incomplete. Good ideas tend to rise to the top: that’s how DataStax, as a data application community, discovered SMACK. You can substitute the individual parts for different systems, but the further away you get, the further away you are from the mainstream community. If you are having problems keeping up with the waves of data in your application, give SMACK a try.
This post is a collaboration between Mesosphere and O’Reilly. See our statement of editorial independence.