Accessing other data formats

So far, we have worked on CSV files only. The pandas package offers similar functionality (and functions) in order to load MS Excel, HDFS, SQL, JSON, HTML, and Stata datasets. Since most of these formats are not used routinely in data science, the understanding of how one can load and handle each of them is mostly left to you, who can refer to the documentation available on the pandas website (http://pandas.pydata.org/pandas-docs/version/0.16/io.html). Here, we will only demonstrate the essentials on how to effectively use your disk space to store and retrieve information for machine learning algorithms in a fast and efficient way. In such a case, you can leverage an SQLite database (https://www.sqlite.org/index.html ...

Get Python Data Science Essentials - Third Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.