Cleaning the dataset

Data cleaning, or tidying up the data, is the process of transforming raw data into a specific form of consistent data, which includes analysis in a simple manner. The R programming language includes a set of comprehensive tools that are specifically designed to clean the data in an effective manner. We will be focusing on cleaning the dataset here in a specific way by observing the following steps:

  1. Include the libraries that are required to clean and tidy up the dataset:
> library(dplyr) 
> library(tidyr)
  1. Analyze the summary of our dataset, which will help us to focus on the attributes we need to work on:
> summary(AirQualityUCI) Date Time CO(GT) PT08.S1(CO) NMHC(GT) Min. :2004-03-10 00:00:00 Min. :1899-12-31 00:00:00 ...

Get Hands-On Exploratory Data Analysis with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.