O'Reilly logo

Effective Amazon Machine Learning by Alexis Perrier

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Creating our own non-linear dataset

A good way to create a non-linear dataset is to mix sines with different phases. The dataset we will work with in this chapter is created with the following Python script and exported to a CSV file:

import numpy as npn_samples = 1000de_linearize = lambda X: np.cos(1.5 * np.pi * X) + np.cos( 5 * np.pi * X )X = np.sort(np.random.rand(n_samples)) * 2y = de_linearize(X) + np.random.randn(n_samples) * 0.1

As usual, X is the predictor, and y the outcome. You can use variations on that script to easily generate other non-linear datasets. Note that we have used a lambda function, which is a Pythonic way of declaring a function on the spot when needed. Then we shuffle the dataset by sorting randomly (np.random.rand(n_samples) ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required