Chapter 2. Developing on Databricks
Databricks has significantly grown in popularity in the last years, and many data scientists and machine learning engineers struggle to find a way to develop their code on Databricks.
Notebooks are first-class citizens on a Databricks platform, and this is a to-go tool for data scientists to use during the experimentation phase. It is a perfect tool for that purpose and has significantly contributed to Python becoming the number one language on GitHub, but it does not help to promote best software engineering practices. As a result, code written by data scientists needs extensive refactoring afterwards:
-
Creating functions, classes, and modules
-
Separating tasks
-
Separating configurations that may change in the future
-
Adding unit testing
-
Adding logging and documentation
-
Packaging the project’s code
In my career, I have seen too ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access