1Explanatory Tools for Machine Learning in the Symbolic Data Analysis Framework

The aim of this chapter is mainly to give explanatory tools for the understanding of standard, complex and big data. First, we recall some basic notions in Data Science: what are complex data? What are classes and classes of complex data? Which kind of internal class variability can be considered? Then, we define “symbolic data” and “symbolic data tables”, which express the within variability of classes, and we give some advantages of such kind of class description. Often in practice the classes are given. When they are not given, clustering can be used to build them by the Dynamic Clustering method (DCM) from which DCM regression, DCM canonical analysis, DCM mixture decomposition, and the like can be obtained. The description of these class yields by aggregation to a symbolic data table. We say that the description of a class is much more explanatory when it is described by symbolic variables (closer from the natural language of the users), and then by its usual analytical multidimensional description. The explanatory and characteristic power of classes can then be measured by criteria based on the symbolic data description of these classes and induce a way for comparing clustering methods by their explanatory power. These criteria are defined in a Symbolic Data Analysis framework for categorical variables, based on three random variables defined on the ground population. Tools are then given for ...

Get Advances in Data Science now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.