3Data Inspection and Data Quality

To dos: how to delete values and how to convert pandas to numpy array and back, data, table and dplyr in r, and hmisc in R.

3.1 Data Formats

In R we can use the as operator to change from one data format to another.

In Python we can use str and int to convert to string and integer formats. We can use split to convert string to list.

Numeric—We use int and float functions to convert data to numeric types integer and float, respectively.

This is demonstrated in the following code. Note in R the index starts from 1 and in Python it starts from 0.

import re
import numpy as np
import pandas as pd
numlist=[“$10000”,“$20,000”,“30,000”,40000,“50000 ”]
for i,value in enumerate(numlist):
    numlist[i]=re.sub(r“([$,])”,“”,str(value))


numlist
['10000', '20000', '30000', '40000', '50000 ']
int(numlist[1])
20000
for i,value in enumerate(numlist):
    numlist[i]=int(value)

numlist
[10000, 20000, 30000, 40000, 50000]
np.mean(numlist)
30000.0
numlist2=str(numlist)
numlist2.split(None,0)
['[10000, 20000, 30000, 40000, 50000]']
numlist2.split(None,0)[0]
'[10000, 20000, 30000, 40000, 50000]'

3.1.1 Converting Strings to Date Time in Python

from datetime import datetime
datetime_object = datetime.strptime('Jun 7 2016 1:33PM', '%b %d %Y %I:%M%p')

R has lubridate package (https://cran.r-project.org/web/packages/lubridate/lubridate.pdf) for easy conversion of strings of data to date and time, but Python has the date–time package. See examples of lubridate at http://rpubs.com/newajay/datesquality ...

Get Python for R Users now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.