IN THIS CHAPTER
Manipulating data streams
Working with flat and unstructured files
Interacting with relational and NoSQL databases
Interacting with web-based data
“Real data is a reality check.”
— NATE SILVER
Data science applications require data by definition. It would be nice if you could simply go to a data store somewhere, purchase the data you need in an easy-open package, and then write an application to access that data. However, data is messy. It appears in all sorts of places, in many different forms, and you can interpret it in many different ways. Every organization has a different method of viewing data and stores it in a different manner as well. Even when the data management system used by one company is the same as the data management system used by another company, the chances are slim that the data will appear in the same format or even use the same data types. In short, before you can do any data science work, you must discover how to access the data in all its myriad forms. Real data requires a lot of work in order to use it, ...