November 2017
Intermediate to advanced
374 pages
10h 19m
English
Let's walk you through how scikit-learn produces the regression dataset by taking a look at the source code (with some modifications for clarity). Any undefined variables are assumed to have the default value of make_regression.
It's actually surprisingly simple to follow. First, a random array is generated with the size specified when the function is called:
X = np.random.randn(n_samples, n_features)
Given the basic dataset, the target dataset is then generated:
ground_truth = np.zeros((np_samples, n_target))ground_truth[:n_informative, :] = 100*np.random.rand(n_informative, n_targets)
The dot product of X and ground_truth are taken to get the final target values. Bias, if any, is added at this time:
y = np.dot(X, ground_truth) ...
Read now
Unlock full access