Training a logistic regression model for document classification
In this section, we will train a logistic regression model to classify the movie reviews into positive and negative reviews. First, we will divide the
DataFrame of cleaned text documents into 25,000 documents for training and 25,000 documents for testing:
>>> X_train = df.loc[:25000, 'review'].values >>> y_train = df.loc[:25000, 'sentiment'].values >>> X_test = df.loc[25000:, 'review'].values >>> y_test = df.loc[25000:, 'sentiment'].values
Next, we will use a
GridSearchCV object to find the optimal set of parameters for our logistic regression model using 5-fold stratified cross-validation:
>>> from sklearn.model_selection import GridSearchCV >>> from sklearn.pipeline import Pipeline ...