Chapter 2
Exploring Big Data
IN THIS CHAPTER
Using NumPy for data science
Using Pandas for fast data analysis
Learning from your first data science project
Visualizing with MatPlotLib in Python
In this chapter, you discover some of the tools and processes that data scientists use to format, process, and query data.
A number of Python-based tools and libraries (such as R) are available, but we decided to use NumPy for three reasons. First, it is one of the two most popular tools to use for data science in Python. Second, many AI-oriented projects use NumPy (such as the one in the last chapter). And third, the highly useful Python data science package, Pandas, is built on NumPy.
Pandas is turning out to be an important package in data science. It encapsulates data in a more abstract way, making it easier to manipulate, document, and understand the transformations you make in the base datasets.
MatPlotLib is a good Python-centric package for visualizing the results of big data analysis but requires a steep learning curve. However, this has been ameliorated to some degree by new add-on ...
Get Python All-in-One For Dummies, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.