Let's get a feel of the data, starting with the users:
julia> using DataFrames julia> describe(users, stats = [:min, :max, :nmissing, :nunique, :eltype])
The output is as follows:
We chose a few key stats—the minimum and maximum values, the number of missing and unique values, and the type of data. Unsurprisingly, the User-ID column, which is the table's primary key, starts at 1 and goes all the way up to 278858 with no missing values. However, the Age column shows a clear sign of data errors—the maximum age is 244 years! Let's see what we have there by plotting the data with Gadfly:
julia> using Gadfly julia> ...