Skip to Content
Natural Language Annotation for Machine Learning
book

Natural Language Annotation for Machine Learning

by James Pustejovsky, Amber Stubbs
October 2012
Beginner to intermediate
342 pages
9h 55m
English
O'Reilly Media, Inc.
Content preview from Natural Language Annotation for Machine Learning

Chapter 5. Applying and Adopting Annotation Standards

Now that you’ve created the spec for your annotation goal, you’re almost ready to actually start annotating your corpus. However, before you get to annotating you need to consider what form your annotated data will take—that is to say, you know what you want your annotators to do, but you have to decide how you want them to do it. In this chapter we’ll examine the different formats annotation can take, and discuss the pros and cons of each one by answering the following questions:

  • What does annotation look like?

  • Are different types of tasks represented differently? If so, how?

  • How can you ensure that your annotation can be used by other people and in conjunction with other tasks?

  • What considerations go into deciding on an annotation environment and data format, both for the annotators and for machine learning?

Before getting into the details of how to apply your spec to your corpus, you need to understand what annotation actually looks like when it has been applied to a document or text. So now let’s look at the spec examples from Chapter 4 and see how they can be applied to an actual corpus.

There are many different ways to represent information about a corpus. The examples we show you won’t be exhaustive, but they will give you an overview of some of the different formats that annotated data can take.

Note

Keep your data accessible. Your annotation project will be much easier to manage if you choose a format for your data that’s ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Machine Learning with PyTorch and Scikit-Learn

Machine Learning with PyTorch and Scikit-Learn

Sebastian Raschka, Yuxi (Hayden) Liu, Vahid Mirjalili

Publisher Resources

ISBN: 9781449332693Errata