Unit 13 | Processing HTML Files |
The first type of structured text document you’ll look at is HTML—a markup language commonly used on the web for human-readable representation of information. An HTML document consists of text and predefined tags (enclosed in angle brackets <>) that control the presentation and interpretation of the text. The tags may have attributes. The following table shows some HTML tags and their attributes.
Tag | Attributes | Purpose |
---|---|---|
HTML | Whole HTML document | |
HEAD | Document header | |
TITLE | Document title | |
BODY | background, bgcolor | Document body |
H1, H2, H3, etc. | Section headers | |
I, EM | Emphasis | |
B, STRONG | Strong emphasis | |
PRE | Preformatted text | |
P, SPAN, DIV | Paragraph, span, division ... |
Get Data Science Essentials in Python now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.