How to read a CSV file in R?
In this section you will learn how to import a CSV file in R with the
read.csv2 functions. You can see the basic syntax of the functions with the most common arguments in the following code block. For additional details remember to type
# Comma as separator and dot as decimal point by default read.csv(file, # File name or full path of the file header = TRUE, # Whether to read the header or not sep = ",", # Separator of the values quote = "\"", # Quoting character dec = ".", # Decimal point fill = TRUE, # Whether to fill blacks or not comment.char = "", # Character of the comments or empty string encoding = "unknown", # Encoding of the file ...) # Additional arguments # Semicolon as separator and comma as decimal point by default read.csv2(file, header = TRUE, sep = ";", quote = "\"", dec = ",", fill = TRUE, comment.char = "", encoding = "unknown", ...)
You may have noticed that the only difference between the functions are the separator of the values and the decimal separator, due to in some countries they use commas as decimal separator.
In the second case, in order to create CSV files the semicolon is needed if some numbers are decimals. As you may find datasets with both characteristics, you can use the corresponding function instead of changing the parameters of the arguments.
The following table summarizes the three main default arguments:
In order to load a CSV file in R with the default arguments, you can pass the file as string to the corresponding function. The output will be of class
If you just execute the previous code you will print the data frame but it will not be stored in memory, since you have not assigned it to any variable. If you save it in a variable called
my_file, you will be able to access the variables or the data you want.
my_file <- read.csv("my_file.csv")
CSV file header
By default, the functions read the header of the files. In case you want to read the CSV without header you will need to set to
read.csv("my_file.csv", header = FALSE)
A common issue arises with bad encoding of the files. In case you are reading a file with rare characters you maybe need to specify the encoding. Setting the encoding to
UTF-8 tends to solve the most of these problems.
read.csv("my_file.csv", encoding = "UTF-8")
Note that this argument and the following are inherited from the
The na.strings argument
Sometimes the files contain some character string that represents missing or omitted values. You will find more information about how missing values are handled in the source of the data set you are working with. In order to solve this issue you can convert them to
NA values with the
na.strings argument, specifying the character string that represents the missing value.
Consider, for instance, that in your CSV file the
-9999 values represent missing data. In this scenario you could type:
read.csv("my_file.csv", na.strings = "-9999")
Moreover, in case the file contains multiple
na.strings you can specify all inside a vector.
read.csv("my_file.csv", na.strings = c("-9999" , "Na" ))
However, if you need to remove
NA values or the value specified as it after importing you will need to use the corresponding function depending on your data. The most common function to remove missing values is
The stringsAsFactors argument
stringsAsFactors argument of the function will transform the string (character) columns of the dataset into factors.
read.csv("my_file.csv", stringsAsFactos = TRUE)
Read multiple CSV files in R
It is worth to mention that it is possible to import multiple CSV files at the same time instead of loading them into R one by one. For that purpose you can use the
list.files function in order to look for all CSV files and then read them applying the
read.csv2) function with the
files <- list.files(pattern = "*.csv") multiple_csv <- sapply(files, read.csv)