September 2015
Beginner to intermediate
454 pages
10h 49m
English
In this section, we will train a logistic regression model to classify the movie reviews into positive and negative reviews. First, we will divide the DataFrame of cleaned text documents into 25,000 documents for training and 25,000 documents for testing:
>>> X_train = df.loc[:25000, 'review'].values >>> y_train = df.loc[:25000, 'sentiment'].values >>> X_test = df.loc[25000:, 'review'].values >>> y_test = df.loc[25000:, 'sentiment'].values
Next we will use a GridSearchCV object to find the optimal set of parameters for our logistic regression model using 5-fold stratified cross-validation:
>>> from sklearn.grid_search import GridSearchCV >>> from sklearn.pipeline import Pipeline >>> ...
Read now
Unlock full access