Getting ready

We start by importing the required libraries:

import osimport globimport pandas as pd

 We set our working folder as follows:

os.chdir("/.../Chapter 11/CS - IMDB Classification")os.getcwd()

We set our path variable and iterate through the .txt files in the folders. 

Note that we have a subfolder, /txt_sentoken/pos, which holds the TXT files for the positive reviews. Similarly, we have a subfolder, /txt_sentoken/neg, which holds the TXT files for the negative reviews.

The TXT files for the positive reviews are read and the reviews are appended in an array. We use the array to create a DataFrame, df_pos.

path="/.../Chapter 11/CS - IMDB Classification/txt_sentoken/pos/*.txt"files = glob.glob(path)text_pos = []for p in files: file_read ...

Get Ensemble Machine Learning Cookbook now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.