- Begin by loading the saved text data for pre-processing with the help of the load_data function:
def load_data(): """ Loading Data """ input_file = os.path.join(TEXT_SAVE_DIR) with open(input_file, "r") as f: data = f.read()return data
- Implement define_tokens, as defined in the Pre-processing the data section of this chapter. This will help us create a dictionary of the key words and their corresponding tokens:
def define_tokens(): """ Generate a dict to turn punctuation into a token. Note that Sym before each text denotes Symbol :return: Tokenize dictionary where the key is the punctuation and the value is the token """ dict = {'.':'_Sym_Period_', ',':'_Sym_Comma_', '"':'_Sym_Quote_',