December 2018
Beginner to intermediate
500 pages
12h 10m
English
A common strategy in machine learning implementations is to split the data into training (some 80-90%) and testing (the remaining 10-20%) datasets. First, we'll initialize two empty DataFrames to store this data:
julia> training_data = DataFrame(UserID = Int[], ISBN = String[], Rating = Int[]) 0×3 DataFrame julia> test_data = DataFrame(UserID = Int[], ISBN = String[], Rating = Int[]) 0×3 DataFrame
Next, we'll iterate through our top_ratings and put the contents into the corresponding DataFrame. We'll go with 10% of data for testing—so with each iteration, we'll generate a random integer between 1 and 10. The chances of getting a 10 are, obviously, one in ten, so when we get it, we put the corresponding row into ...
Read now
Unlock full access