2Likelihood in the Symbolic Context

2.1. Introduction

Analyzing classes (or groups) of raw data, with each class being considered as a statistical unit, can be of interest, for example, when dealing with objects having a complex behavior, or dealing with a very large dataset split into groups of data. While a standard statistical analysis can work when each class is summarized by the class mean, the problem becomes less obvious when the class summarization is, for example, an estimator of the distribution of that class data. According to E. Diday, who introduced the paradigm of “Symbolic Data Analysis” [DID 87], a symbol of a statistical unit is any mathematical object summarizing the variability internal to that unit, see also [BIL 03, BIL 06, BOC 99, DID 16]. For example, a symbol of a class of real data can be just a real number (e.g. the class mean or its variance), and also an interval (the class range), a function (e.g. the class empirical c.d.f. or an histogram built from that class), or a probability distribution (e.g. a theoretical distribution estimated from that class data).

In this chapter, our first aim is to propose a probabilistic framework for properly defining symbolic data as statistical units, modifying the framework proposed in [EMI 15] a little bit. Our second aim is to consider the problem of defining distributions on symbols or, more simply, to propose some likelihood functions for finite-dimensional symbols. In fact, in the case where symbols are probability ...

Get Advances in Data Science now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.