December 2018
Beginner to intermediate
684 pages
21h 9m
English
We will now read a larger set of 2,225 BBC News articles (see GitHub for data source details) that belong to five categories and are stored in individual text files. We need to do the following:
files = Path('..', 'data', 'bbc').glob('**/*.txt')bbc_articles = []for i, file in enumerate(files): _, _, _, topic, file_name = file.parts with file.open(encoding='latin1') as f: lines = f.readlines() body = ' '.join([l.strip() for l in lines[1:]]).strip() bbc_articles.append(body)len(bbc_articles)2225