2Data Input

2.1 Data Input in Pandas

The pandas library offers many flexible formats for reading in data.

The most commonly used is read_csv to read in comma‐separated values (from the Internet URL). That is,

anscombe=pd.read_csv("https://vincentarelbundock.github.io/Rdatasets/csv/datasets/anscombe.csv")

See the top few lines at http://nbviewer.jupyter.org/gist/decisionstats/3737642751895f470d5c07194302f53e. © GitHub repository.

Snipped image displaying the pandas data frame containing the formats for reading in data, with a table with 9 columns and 11 rows labeled unnamed, x1, x2, x3, x4, y1, y2, y3, and y4, and numbers 0–1, respectively.

Or read in csv data from a local file.

See http://nbviewer.jupyter.org/gist/decisionstats/4142e98375445c5e4174

import pandas as pd #importing packages
import os as os

 In [2]:

#pd.describe_option() #describe options for customizing

 In [3]:

#pd.get_option("display.memory_usage")#setting some options

 In [4]:

os.getcwd() #current working directory

 Out [4]:

'/home/ajay'

 In [5]:

os.chdir('/home/ajay/Desktop')

 In [6]:

os.getcwd()

 Out [6]:

'/home/ajay/Desktop'

 In [7]:

a=os.getcwd()
os.listdir(a)

 Out [7]:

['adult.data']

 In [8]:

names2=["age","workclass","fnlwgt","education","education-num","marital-status","occupation","relationship","race","sex","capital-gain","capital-loss","hours-per-week","native-country","income"]

 In [9]:

len(names2)

 Out [9]:

15

 In [10]:

adult=pd.read_csv("adult.data",header=None)

 In [11]:

len(adult)

 Out [11]:

32562

 In [12]:

adult.columns

 Out [12]:

Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14], ...

Get Python for R Users now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.