Paweł Chrząszcz

Extraction of Polish Multiword Expressions

Abstract: Natural language processing for fusional languages often requires using inflection dictionaries which lack multiword expressions (MWE) – proper names and other sequences of words that have new properties as a whole. This paper describes an effort to extract such MWEs from Polish text. The search is not limited to any particular domain and there are no predefined categories, gazetteers or manually defined rules, what makes it different from named entity recognition (NER). As there are no Polish linguistic resources containing MWEs, we cannot use supervised learning techniques, so Wikipedia content and link structure are used to create syntactic patterns that are recognised in ...

Get Natural Language Processing and Cognitive Science now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.