Chapter 1

Power Grid Data Analysis with R and Hadoop

Ryan Hafen, Tara Gibson, Kerstin Kleese van Dam and Terence Critchlow,    Pacific Northwest National Laboratory, Richland, Washington, USA

Abstract

In this chapter, we use the R and Hadoop Integrated Programming Environment (RHIPE) as a flexible, scalable environment for analyzing multiterabyte data sets being produced by a phasor measurement unit sensor network on the electrical power grid. RHIPE enables exploratory data analysis on the entire data set, allowing us to develop both data cleaning and event classification methods that reflect event characteristics as represented by the actual data instead of relying on theoretical models. We describe several of the data cleaning filters that we ...

Get Data Mining Applications with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.