O'Reilly logo

Hands-On Natural Language Processing with Python by Rajalingappaa Shanmugamani, Rajesh Arumugam

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Data preparation

First, we will load the original news file and the summaries into a pandas DataFrame:

titledata=[]artdata=[]with gzip.open('data/news.txt.gz') as artfile:    for li in artfile:        artdata.append(li)with gzip.open('data/summary.txt.gz') as titlefile:    for li in titlefile:        titledata.append(li)news = pd.DataFrame({'Text':artdata,'Summary':titledata})news = news.sample(frac=0.1)news['Text_len'] = news.Text.apply(lambda x: len(x.split()))news['Summary_len'] = news.Summary.apply(lambda x: len(x.split()))

We will take a look at some sample news Text and  Summary:

print(news['Text'].head(2).values)print(news['Summary'].head(2).values)Output:[b'chinese president hu jintao said here monday that china will work with romania to promote bilateral ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required