In this part of the book Data Science for Software Engineering: Sharing Data and Models, we show that sharing all data is less useful that sharing just the relevant data. There are several useful methods for finding those relevant data regions including simple nearest neighbor, or kNN, algorithms; clustering (to optimize subsequent kNN); and pruning away “bad” regions. Also, we show that with clustering, it is possible to repair missing data in project records.
Also known as:
CLIFF is an instance selector that can be used as an instance-based selector (or case-based) reasoner. MORPH is an instance mutator and can also be categorized as a noise injector.
The combination ...
With Safari, you learn the way you learn best. Get unlimited access to videos, live online training,
learning paths, books, interactive tutorials, and more.