Recipe 5 – extracting named entities
Reconciliation works great for those fields in your dataset that contain single terms, such as names of people, countries, or works of art. However, if your column contains running text, then reconciliation cannot help you, since it can only search for single terms in the datasets it uses. Fortunately, another technique called named-entity extraction
can help us. An extraction algorithm searches texts for named entities which are text elements, such as names of persons, locations, values, organizations, and other widely-known things. In addition to just extracting the terms, most algorithms also try to perform disambiguation. For instance, if the algorithm finds Washington in a text, it will try to determine ...
Get Using OpenRefine now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.