August 2014
Beginner to intermediate
304 pages
7h 10m
English
In this chapter, we will cover the following recipes:
This chapter covers parsing specific kinds of data, focusing primarily on dates, times, and HTML. Luckily, there are a number of useful libraries to accomplish this, so we don't have to delve into tricky and overly complicated regular expressions. These libraries can be great complements to NLTK:
dateutil provides datetime parsing and timezone conversionlxml and BeautifulSoup can parse, clean, and convert HTMLcharade and UnicodeDammit ...Read now
Unlock full access