Data preparation

First, we will read the source text and the target text, which are in French and English, respectively:

frdata=[]endata=[]with open('data/train_fr_lines.txt') as frfile:    for li in frfile:        frdata.append(li)with open('data/train_en_lines.txt') as enfile:    for li in enfile:        endata.append(li)mtdata = pd.DataFrame({'FR':frdata,'EN':endata})mtdata['FR_len'] = mtdata['FR'].apply(lambda x: len(x.split(' ')))mtdata['EN_len'] = mtdata['EN'].apply(lambda x: len(x.split(' ')))print(mtdata['FR'].head(2).values)print(mtdata['EN'].head(2).values)Output:['Voici Bill Lange. Je suis Dave Gallo.\n' 'Nous allons vous raconter quelques histoires de la mer en vidéo.\n']["This is Bill Lange. I'm Dave Gallo.\n" "And we're going to tell you some stories ...

Get Hands-On Natural Language Processing with Python now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.