10 The Impact of Multicollinearity on Big Data Multivariate Analysis Modeling

The purpose of this work is to address, discuss and attempt to resolve some of the issues that appear in the presence of multicollinearity, such as the overfitting in regression analysis, the accuracy of the impact of the parameter estimation in the dependent variable and the inconsistent results during the analysis of variance, in order to properly model the public pension expenditures (PPE). For this purpose, we proceed to locate, collect and analyze the factors which may have an impact on the shaping of the PPE in the short term or long term. The analysis focuses on 20 European countries for which a large amount of data is available, including a set of 20 possible explanatory variables for the period 2001–2015.

10.1. Introduction

The tremendous increase in the development of technology, as well as the creation of new databases on a variety of topics, makes Big Data Analytics more efficient to work with. However, more is not always better. Large amounts of data may sometimes fail to perform properly in data analytic applications. Indeed, when it comes to modeling, a multitude of explanatory variables for extensive time periods can cause inconsistencies in the interpretation of statistical results. The most important obstacle that we have to overcome is the existence of multicollinearity between the covariates. To deal with this, among other issues, special techniques called dimension reduction techniques, ...

Get Applied Modeling Techniques and Data Analysis 1 now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.