O'Reilly logo

Building the Unstructured Data Warehouse: Architecture, Analysis, and Design by Krish Krishnan, W. H. Inmon

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Another similar aspect of Textual ETL processing is that of removal of punctuation. If a query is made literally, punctuation may become a problem. Therefore, when it comes to Textual ETL processing, punctuation must be removed from consideration. As an example, suppose a query is made looking for “Harper’s Ferry”. If the query looks for “Harper’s” with an apostrophe, a match may or may not be made. (If Harper’s is spelled without an apostrophe, the search won’t find it.) A surer way to proceed is to look for “Harpers Ferry” where a hit is made with punctuation removed. In the interest of finding different and a wider set of hits, punctuation is best removed.

Figure 4.7 shows the practice of removal of punctuation by Textual ETL. ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required