Exploring the Importance of Open Source
IN THIS CHAPTER
Understanding the importance of open source in data science
Explaining open source programming languages
Exploring open source frameworks and tools
Deciding what to select
The biggest names in data science are open source, with many of them even part of the same (open source) Apache family: Spark, Hadoop, Kafka, and Cassandra. Though closed source databases are still incredibly popular, open source alternatives are growing at a rapid pace. It is clear that, if they keep growing, those closed source databases won’t be that popular for much longer. This chapter focuses on explaining why open source is important in data science, as well as giving you an overview of popular tools and frameworks.
Exploring the Role of Open Source
The popularity of open source systems in data science is growing, for a number of reasons. First, open source principles are based on the sharing of assets, an approach that allows different people in different areas to effectively work together. When companies share their work and allow others ...