Read CSV in R
It is usual to find datasets in CSV (comma separated values) format. This type of data storage is a lightweight solution for the most use cases. In this tutorial you will learn how to read a CSV in R to work with.
How to read a CSV file in R?
In this section you will learn how to import a CSV file in R with the read.csv
and read.csv2
functions. You can see the basic syntax of the functions with the most common arguments in the following code block. For additional details remember to type ?read.csv
or ?read.csv2
.
# Comma as separator and dot as decimal point by default
read.csv(file, # File name or full path of the file
header = TRUE, # Whether to read the header or not
sep = ",", # Separator of the values
quote = "\"'", # Quoting character
dec = ".", # Decimal point
fill = TRUE, # Whether to fill blacks or not
comment.char = "", # Character of the comments or empty string
encoding = "unknown", # Encoding of the file
...) # Additional arguments
# Semicolon as separator and comma as decimal point by default
read.csv2(file, header = TRUE, sep = ";", quote = """, dec = ",",
fill = TRUE, comment.char = "", encoding = "unknown", ...)
You may have noticed that the only difference between the functions are the separator of the values and the decimal separator, due to in some countries they use commas as decimal separator.
In the second case, in order to create CSV files the semicolon is needed if some numbers are decimals. As you may find datasets with both characteristics, you can use the corresponding function instead of changing the parameters of the arguments.
The following table summarizes the three main default arguments:
Function | Header | Sep | Dec |
---|---|---|---|
read.csv | TRUE | “,” | “.” |
read.csv2 | TRUE | “;” | “,” |
In order to load a CSV file in R with the default arguments, you can pass the file as string to the corresponding function. The output will be of class data.frame
.
read.csv("my_file.csv")
If you just execute the previous code you will print the data frame but it will not be stored in memory, since you have not assigned it to any variable. If you save it in a variable called my_file
, you will be able to access the variables or the data you want.
my_file <- read.csv("my_file.csv")
The file must be on your working directory. If not, you will need to specify the full path of the file in the file
argument.
CSV file header
By default, the functions read the header of the files. In case you want to read the CSV without header you will need to set to FALSE
the header
argument.
read.csv("my_file.csv", header = FALSE)
CSV encoding
A common issue arises with bad encoding of the files. In case you are reading a file with rare characters you maybe need to specify the encoding. Setting the encoding to UTF-8
tends to solve the most of these problems.
read.csv("my_file.csv", encoding = "UTF-8")
Note that this argument and the following are inherited from the read.table
function.
Note that this argument and the following are inherited from the read.table
function.
The na.strings argument
Sometimes the files contain some character string that represents missing or omitted values. You will find more information about how missing values are handled in the source of the data set you are working with. In order to solve this issue you can convert them to NA
values with the na.strings
argument, specifying the character string that represents the missing value.
Consider, for instance, that in your CSV file the -9999
values represent missing data. In this scenario you could type:
read.csv("my_file.csv", na.strings = "-9999")
Moreover, in case the file contains multiple na.strings
you can specify all inside a vector.
read.csv("my_file.csv", na.strings = c("-9999" , "Na" ))
However, if you need to remove NA
values or the value specified as it after importing you will need to use the corresponding function depending on your data. The most common function to remove missing values is na.omit
.
The stringsAsFactors argument
The stringsAsFactors
argument of the function will transform the string (character) columns of the dataset into factors.
read.csv("my_file.csv", stringsAsFactos = TRUE)
Read multiple CSV files in R
It is worth to mention that it is possible to import multiple CSV files at the same time instead of loading them into R one by one. For that purpose you can use the list.files
function in order to look for all CSV files and then read them applying the read.csv
(or read.csv2
) function with the sapply
function.
files <- list.files(pattern = "*.csv")
multiple_csv <- sapply(files, read.csv)