O'Reilly logo

Hands-On Natural Language Processing with Python by Rajalingappaa Shanmugamani, Rajesh Arumugam

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Loading dialog datasets in the QA format

As described in the previous section, we need to convert dialog data from line-by-line conversation turns into a (facts, question, answer) tuple format for each turn of the dialog. For this purpose, we need to write a method that will read lines from the raw dialog corpus and return the desired tuples for training in a memory network paradigm.

Since we will be using word vectors as inputs to our model, we first need to define a tokenize method which will be used for converting a sentence into a list of words (minus special symbols and common words):

def tokenize(sent):    stop_words = {"a", "an", "the"}    sent = sent.lower()    if sent == '<silence>':        return [sent]    # Convert sentence to tokens result ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required