CHAPTER 3

MASSIVE DATA SETS

Prefatory note. On July 7–8, 1995, a workshop on the statistical analysis and visualization of massive data sets, organized by Jon Kettenring, with more than 50 participants, was conducted at the National Research Council’s facilities in Washington, D.C. The proceedings of that workshop were published by Kettenring and Pregibon (1996).1 My own contribution to the workshop – an attempt to summarize and synthesize the issues – had been subtitled “The Morning After”; it had actually been drafted on the flight back home after the workshop (see Huber 1996a). Clearly, some of my statements are dated, for example tasks that then required a super-workstation now can be handled on standard PCs. My main conclusions remain standing, so I have left the immediacy of my responses intact, except that I have shortened or excised some clearly dated material and have added some italicized afterthoughts benefitting from hindsight.

3.1 INTRODUCTION

This paper collects some of my observations at, reactions to, and conclusions from the workshop on Massive Data Sets in Washington D.C., July 7–8, 1995.2 We had not gotten as far as I had hoped. We had discussed long wish-lists, but had not winnowed them down to a list of challenges. While some position papers had discussed specific bottlenecks, or had recounted actual experiences with methods that worked, and things one would have liked to do but couldn’t, those examples had not been elaborated upon and inserted into a coherent ...

Get Data Analysis: What Can Be Learned From the Past 50 Years now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.