Preparing tools and datasets

As introduced in the previous chapters, the Python package for machine learning with the lion's share is scikit-learn. In this chapter, we also will use XGboost, LightGBM, and Catboost: you'll find the instructions in the relevant sections.

The motivations for using scikit-learn developed at Inria, the French Institute for Research in Computer Science and Automation (, are multiple. It is worthwhile at this point to mention the most important reasons for using scikit-learn for the success of your data science project:

  • A consistent API (fit, predict, transform, and partial_fit) across models that naturally helps to correctly implement data science procedures working on data organized in NumPy arrays ...

Get Python Data Science Essentials - Third Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.