Chapter 17

Exploring the Importance of Open Source

IN THIS CHAPTER

Bullet Understanding the importance of open source in data science

Bullet Explaining open source programming languages

Bullet Exploring open source frameworks and tools

Bullet Deciding what to select

The biggest names in data science are open source, with many of them even part of the same (open source) Apache family: Spark, Hadoop, Kafka, and Cassandra. Though closed source databases are still incredibly popular, open source alternatives are growing at a rapid pace. It is clear that, if they keep growing, those closed source databases won’t be that popular for much longer. This chapter focuses on explaining why open source is important in data science, as well as giving you an overview of popular tools and frameworks.

Exploring the Role of Open Source

The popularity of open source systems in data science is growing, for a number of reasons. First, open source principles are based on the sharing of assets, an approach that allows different people in different areas to effectively work together. When companies share their work and allow others ...

Get Data Science Strategy For Dummies now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.