Chapter 14. Data Analysis Examples
Now that we’ve reached the end of this book’s main chapters, we’re going to take a look at a number of real-world datasets. For each dataset, we’ll use the techniques presented in this book to extract meaning from the raw data. The demonstrated techniques can be applied to all manner of other datasets, including your own. This chapter contains a collection of miscellaneous example datasets that you can use for practice with the tools in this book.
The example datasets are found in the book’s accompanying GitHub repository.
14.1 1.USA.gov Data from Bitly
In 2011, URL shortening service Bitly partnered with the US government website USA.gov to provide a feed of anonymous data gathered from users who shorten links ending with .gov or .mil. In 2011, a live feed as well as hourly snapshots were available as downloadable text files. This service is shut down at the time of this writing (2017), but we preserved one of the data files for the book’s examples.
In the case of the hourly snapshots, each line in each file contains a common form of web data known as JSON, which stands for JavaScript Object Notation. For example, if we read just the first line of a file we may see something like this:
In
[
5
]:
path
=
'datasets/bitly_usagov/example.txt'
In
[
6
]:
open
(
path
)
.
readline
()
Out
[
6
]:
'{ "a": "Mozilla
\\
/5.0 (Windows NT 6.1; WOW64) AppleWebKit
\\
/535.11
(
KHTML
,
like
Gecko
)
Chrome
\\/
17.0
.
963.78
Safari
\\/
535.11
", "
c
": "
US
", "
nk
": 1,
"tz"
:
"America
\\
/New_York"
,
"gr" ...
Get Python for Data Analysis, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.