O'Reilly logo

Sharing Data and Models in Software Engineering by Fayola Peters, Leandro Minku, Burak Turhan, Ekrem Kocaguneli, Tim Menzies

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 15

Sharing Less Data (Is a Good Thing)

Abstract

In this part of the book Data Science for Software Engineering: Sharing Data and Models, we show that sharing all data is less useful that sharing just the relevant data. There are several useful methods for finding those relevant data regions including simple nearest neighbor, or kNN, algorithms; clustering (to optimize subsequent kNN); and pruning away “bad” regions. Also, we show that with clustering, it is possible to repair missing data in project records.

Name:PEEKER
Intent:Find the smallest, most useful portion of a data set.
Motivation:Chapters 13, 14 showed that it is possible to share data between software projects. Potentially, those results could motivate massive data collection ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required