Chapter 15Software Engineering Best Practices

In my experience, the single most important skill that is often lacking in data scientists is the ability to write decent code. I'm not talking about writing highly optimized numerical routines, designing fancy libraries or anything like that: just keeping a few hundred lines of code clear and manageable for the course of a project is a learned skill. I've seen many brilliant data scientists coming from areas such as physics and math, who lack this skill because they never had to write anything longer than a few dozen lines, or they never had to go back to code and update it. There's nothing worse than seeing a mathematical genius's data science project go up in flames because their 200-line script was so illegible they couldn't debug it.

That's one reason this chapter exists: to pass on the message that everybody who writes code is responsible for making sure that their code is clear.

The other reason for this chapter is that, in practice, data scientists are often called on to do far more than keep their code readable. Some companies have their data scientists focused strictly on analytics work. In many cases though, it falls to data scientists to turn their one-off scripts into reusable data analysis packages that take on a life of their own. Other times, data scientists function as junior members of a software engineering team, writing large pieces of production code that implement their ideas in a real-time product. This chapter ...

Get The Data Science Handbook now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.