One of the most obvious things that separate data scientists from traditional business analysts and (to a lesser degree) statisticians is that they spend a lot of their time writing code in a more-or-less normal programming language, as software engineers do. Sometimes, it's a statistically oriented language such as R, but even that is a far cry from something such as Excel or a graphical package such as Tableau.
This chapter will discuss why that is and give a brief survey of some of the more popular languages. It will then dive into the weeds of Python, my personal language of choice and the most popular option among data scientists. If you already know Python and its technical libraries, then feel free to skim. If not though, then this chapter will give you the foundation in Python to understand the example code in the rest of the book.
3.1 Why Use a Programming Language? What Are the Other Options?
To date, I have never worked on a data science project that could be done completely within a graphical package such as Excel or Tableau. There is always something – a weird formatting issue that requires coding up the edge cases, a dataset that's too large to fit into memory, an unconventional feature that I want to extract, or something else – that forces me to roll up my sleeves and write some code.
This will be your experience too, almost certainly. To put it glibly, data science is Turing complete. Many data scientists (like me) find it's more ...