Scale
Scaling IoT analytics on open source software can produce some challenges. You will probably be using R or Python to explore and productionalize your ML modeling and many other analytics. Both will scale to the size of a single compute instance. For example, R keeps everything in memory so the limitation is the size of memory on that instance.
Neither of the programming languages is natively distributed (not many are), so to scale beyond a single compute instance, you need to employ additional frameworks and design complications. This is one of the benefits of using Apache Spark. If you can do your analytics in a parallel fashion, you can use Spark to manage the distributed computations. This is the same map and reduce concept introduced ...