Chapter 17

Exploring the Importance of Open Source


Bullet Understanding the importance of open source in data science

Bullet Explaining open source programming languages

Bullet Exploring open source frameworks and tools

Bullet Deciding what to select

The biggest names in data science are open source, with many of them even part of the same (open source) Apache family: Spark, Hadoop, Kafka, and Cassandra. Though closed source databases are still incredibly popular, open source alternatives are growing at a rapid pace. It is clear that, if they keep growing, those closed source databases won’t be that popular for much longer. This chapter focuses on explaining why open source is important in data science, as well as giving you an overview of popular tools and frameworks.

Exploring the Role of Open Source

The popularity of open source systems in data science is growing, for a number of reasons. First, open source principles are based on the sharing of assets, an approach that allows different people in different areas to effectively work together. When companies share their work and allow others ...

Get Data Science Strategy For Dummies now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.