10Clustering and Generalized ANOVA for Symbolic Data Constructed from Open Data
10.1. Introduction
Official statistics are very important sources of open data where National Statistical Offices play a vital role. More and more societies favor the idea of freely available data and, therefore, many governmental institutions have also established open data websites. At the international level, such sources of open data are, for example, the United Nations open data website [UN 17], The World Bank Open Data [WB 17], and The European Union Open Data Portal [EUR 17]. A commonly used technique to present their data in a transparent and compact way is aggregation. There are several important properties and advantages of data aggregation:
- – it is usually the first step to make a large amount of data manageable;
- – it extracts (first) information from big data;
- – it protects the privacy of individuals (persons, companies etc.);
- – it produces second-level units of data.
Aggregated data present original individual units at a higher level, which enables a different view of the data. Symbolic Data Analysis (SDA) provides tools for the analysis of such higher second-level units. Second-level units in SDA are called concepts or classes (Diday, inspired by Aristotle’s collection of works on logic The Organon [ARI] in which he distinguishes between first-level objects called individuals and second-level objects). They represent a natural extension of aggregated descriptions of individuals. ...
Get Advances in Data Science now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.