July 2018
Beginner to intermediate
312 pages
8h 31m
English
First, we will load the original news file and the summaries into a pandas DataFrame:
titledata=[]artdata=[]with gzip.open('data/news.txt.gz') as artfile: for li in artfile: artdata.append(li)with gzip.open('data/summary.txt.gz') as titlefile: for li in titlefile: titledata.append(li)news = pd.DataFrame({'Text':artdata,'Summary':titledata})news = news.sample(frac=0.1)news['Text_len'] = news.Text.apply(lambda x: len(x.split()))news['Summary_len'] = news.Summary.apply(lambda x: len(x.split()))
We will take a look at some sample news Text and Summary:
print(news['Text'].head(2).values)print(news['Summary'].head(2).values)Output:[b'chinese president hu jintao said here monday that china will work with romania to promote bilateral ...
Read now
Unlock full access