Classifying news articles by topic using a CNN

For this example, we will use the dataset of references to news web pages collected by a news aggregator. There are four categories in the dataset belonging to the news of science and technology, business, entertainment, and health. The complete Jupyter Notebook for this example can be found under the Chapter05/03_example.ipynb directory in this book's code repository.

We will first look at the sample of the data from this dataset:

news_df = pd.read_csv('data/newsCorpora.csv',delimiter='\t', header=None, names=['ID','TITLE','URL','PUBLISHER','CATEGORY','STORY','HOSTNAME','TIMESTAMP'])news_df = news_df.sample(frac=1.0)news_df.head(5)

The dataset is represented in the table format as follows:

Get Hands-On Natural Language Processing with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.