Conclusion
Manual corpus annotation is now at the heart of NLP. It is where linguistics hides, in a still ill-explored way. There is a blatant need for manual annotation engineering (in the sense of a precisely formalized process), and this book aims at providing a first step toward a holistic methodology, with a global view on annotation.
Annotating is interpreting and this interpretation has to come from a consensus, built collaboratively. From this point of view, annotation is, by definition, collaborative and represents a shift from a vision according to which one specialist (a linguist) can make irrevocable judgments. There are many ways in which this collaboration can be fostered during the annotation process, and we detailed some of them in this book.
However, the annotation methodology is still little explored as such and many issues remain open. In particular, the annotation of relations and sets has to be analyzed more closely, as they represent a wide variety of situations and complexities. Also, better solutions still need to be found to evaluate them.
As we showed in Chapter 2, crowdsourcing is one of the most effective solutions to lower the cost of manual annotation. GWAP generates astonishing results, especially for language data production, both in terms of quantity and quality. However, designing and developing games require some specific gamification skills and technical expertise that are not accessible to everybody. Moreover, the necessary communication ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access