Data analysis and preparation

Let's get a feel of the data, starting with the users:

julia> using DataFrames 
julia> describe(users, stats = [:min, :max, :nmissing, :nunique, :eltype])  

The output is as follows:

We chose a few key stats—the minimum and maximum values, the number of missing and unique values, and the type of data. Unsurprisingly, the User-ID column, which is the table's primary key, starts at 1 and goes all the way up to 278858 with no missing values. However, the Age column shows a clear sign of data errors—the maximum age is 244 years! Let's see what we have there by plotting the data with Gadfly:

julia> using Gadfly julia> ...

Get Julia Programming Projects now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.