Getting ready

We start by importing the required libraries:

import osimport globimport pandas as pd

 We set our working folder as follows:

os.chdir("/.../Chapter 11/CS - IMDB Classification")os.getcwd()

We set our path variable and iterate through the .txt files in the folders. 

Note that we have a subfolder, /txt_sentoken/pos, which holds the TXT files for the positive reviews. Similarly, we have a subfolder, /txt_sentoken/neg, which holds the TXT files for the negative reviews.

The TXT files for the positive reviews are read and the reviews are appended in an array. We use the array to create a DataFrame, df_pos.

path="/.../Chapter 11/CS - IMDB Classification/txt_sentoken/pos/*.txt"files = glob.glob(path)text_pos = []for p in files: file_read ...

Get Ensemble Machine Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.