O'Reilly logo

Hands-On Natural Language Processing with Python by Rajalingappaa Shanmugamani, Rajesh Arumugam

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Data preparation

First, we will read the source text and the target text, which are in French and English, respectively:

frdata=[]endata=[]with open('data/train_fr_lines.txt') as frfile:    for li in frfile:        frdata.append(li)with open('data/train_en_lines.txt') as enfile:    for li in enfile:        endata.append(li)mtdata = pd.DataFrame({'FR':frdata,'EN':endata})mtdata['FR_len'] = mtdata['FR'].apply(lambda x: len(x.split(' ')))mtdata['EN_len'] = mtdata['EN'].apply(lambda x: len(x.split(' ')))print(mtdata['FR'].head(2).values)print(mtdata['EN'].head(2).values)Output:['Voici Bill Lange. Je suis Dave Gallo.\n' 'Nous allons vous raconter quelques histoires de la mer en vidéo.\n']["This is Bill Lange. I'm Dave Gallo.\n" "And we're going to tell you some stories ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required