Separators and Decimal Points

The default field separator character in read.table is sep=" ". This separator is white space, which is produced by one or more spaces, one or more tabs \t, one or more newlines \n, or one or more carriage returns. If you do have a different separator between the variables sharing the same line (i.e. other than a tab within a .txt file) then there may well be a special read function for your case. Note that these all have the sensible default that header=TRUE (the first row contains the variable names): for comma-separated fields use read.csv("c:\\temp\\file.txt"), for semicolon separated fields read.csv2("c:\\temp\\file.txt"), and for decimal points as a comma read.delim2("c:\\temp\\file.txt"). You would use comma or semicolon separators if you had character variables that might contain one or more blanks (e.g. country names like ‘United Kingdom’ or ‘United States of America’).

If you want to specify row.names then one of the columns of the dataframe must be a vector of unique row names. This can be a single number giving the column of the table which contains the row names, or character string giving the variable name of the table column containing the row names (see p. 123). Otherwise if row.names is missing, the rows are numbered.

The default behaviour of read.table is to convert character variables into factors. If you do not want this to happen (you want to keep a variable as a character vector) then use as.is to specify the columns that should ...

Get The R Book now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.