Data analysis and preparation

Let's get a feel of the data, starting with the users:

julia> using DataFrames 
julia> describe(users, stats = [:min, :max, :nmissing, :nunique, :eltype])  

The output is as follows:

We chose a few key stats—the minimum and maximum values, the number of missing and unique values, and the type of data. Unsurprisingly, the User-ID column, which is the table's primary key, starts at 1 and goes all the way up to 278858 with no missing values. However, the Age column shows a clear sign of data errors—the maximum age is 244 years! Let's see what we have there by plotting the data with Gadfly:

julia> using Gadfly julia> ...

Get Julia 1.0 Programming Complete Reference Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.