Chapter 24. Extracting data with regular expressions
In most information management domains, such as resource management, sales, production, and accounting, natural language texts are only rarely used as a source of information. In contrast, there are domains, such as content and document management, where natural language texts pretty much represent the principal or sole source of information.
Before this information can be utilized, it needs to be extracted from the source texts. This task can be performed manually by a person reading the source, identifying individual pieces of data, and then copying and pasting them into a data entry application. Fortunately, the highly deterministic nature of this operation allows its automation. ...