To pass the input and output data to our model, we would have to preprocess the datasets as follows:
- Import the relevant packages and dataset:
import pandas as pdimport numpy as npimport stringfrom string import digitsimport matplotlib.pyplot as plt%matplotlib inlineimport refrom sklearn.model_selection import train_test_splitfrom keras.models import Modelfrom keras.layers import Input, LSTM, Denseimport numpy as np
$ wget https://www.dropbox.com/s/2vag8w6yov9c1qz/english%20to%20french.txt
lines= pd.read_table('english to french.txt', names=['eng', 'fr'])
- Given that there are more than 140,000 sentences in the dataset, let's consider only the first 50,000 sentence-translation pairs to build the model:
lines = lines[0:50000] ...