Chapter 13

Cross-Company Learning

Handling The Data Drought


In this part of the book Data Science for Software Engineering: Sharing Data and Models, we show that sharing all data is less useful that sharing just the relevant data. There are several useful methods for finding those relevant data regions including simple nearest neighbor, or kNN, algorithms; clustering (to optimize subsequent kNN); and pruning away “bad” regions. Also, we show that with clustering, it is possible to repair missing data in project records.

In summary, this chapter proposes the following data analysis pattern:

Name:Relevancy filtering
Also known as:Transfer learning [352].
Intent:Software defect prediction, when there is insufficient local information ...

Get Sharing Data and Models in Software Engineering now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.