O'Reilly logo

Mastering Python for Data Science by Samir Madhavan

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Training and testing a model

Let's take the data and divide it into training and test sets:

>>> from sklearn import linear_model,cross_validation, 
                   feature_selection,preprocessing
>>> import statsmodels.formula.api as sm
>>> from statsmodels.tools.eval_measures import mse
>>> from statsmodels.tools.tools import add_constant
>>> from sklearn.metrics import mean_squared_error

>>> X = b_data.values.copy() 
>>> X_train, X_valid, y_train, y_valid = 
                     cross_validation.train_test_split( X[:, :-1], X[:, -1], 
                     train_size=0.80)

We first convert the data frame into an array structure using values.copy() of b_data. We then use the train_test_split function of cross_validation from SciKit to divide the data into training and test set for 80% of the data. ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required