O'Reilly logo

Hands-On Natural Language Processing with Python by Rajalingappaa Shanmugamani, Rajesh Arumugam

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Data description

The data provided is in the form of id, qid1, qid2, question1, question2, and is_duplicate, where the id field provides the ID for the training pair, qid and qid2 provide the ID for each question, and question1 and question2 are the full text for each question used for training, and is_duplicate is a Boolean or target value, set to 1 if the pair of texts are duplicates (semantically meaning the same) and 0 if they are not duplicates. The data that we will be using to train contains approximately 404,000 question pairs, along with their labels. 

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required