Chapter 3. Labeling in Action

The previous chapter introduced labeling functions decorators. Those decorators convert the Python functions into weak classifiers for the Snorkel framework. In this chapter, we will use those labeling functions to create labeling strategies and label one text dataset and one image dataset.

As mentioned in previous chapters, weak supervision and data programming are all about bringing together information from different sources and extracting information about various shapes of data. To label the text dataset, we will generate fake/real labels out of activities like the following:

  • Inspecting particular images embedded in article review websites, indicating through their color (red, green, yellow) the level of veracity of the article they are reviewing

  • Summarizing online articles reviewing the news, and extracting their sentiment about the article

  • Aggregating agreement among crowdsourced decision-making

As we have one or more of the preceding signals for each article, we will use Snorkel to reach an agreement among those signals.

For the images dataset, we will put together labeling functions that cast their vote on whether the image is an outdoor or indoor scenery by running small image classifiers over the data. These classifiers aim at recognizing elements like the sky, the grass, etc., as well as image recognition techniques to describe the images. We will follow using text classifiers to generate a second opinion about whether each respective ...

Get Practical Weak Supervision now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.