O'Reilly logo

Python Data Science Essentials - Third Edition by Luca Massaron, Alberto Boschetti

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Accessing other data formats

So far, we have worked on CSV files only. The pandas package offers similar functionality (and functions) in order to load MS Excel, HDFS, SQL, JSON, HTML, and Stata datasets. Since most of these formats are not used routinely in data science, the understanding of how one can load and handle each of them is mostly left to you, who can refer to the documentation available on the pandas website (http://pandas.pydata.org/pandas-docs/version/0.16/io.html). Here, we will only demonstrate the essentials on how to effectively use your disk space to store and retrieve information for machine learning algorithms in a fast and efficient way. In such a case, you can leverage an SQLite database (https://www.sqlite.org/index.html ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required