Read CSV in R

Data Manipulation in R Import and export data
Learn how to read CSV files in R

It is usual to find datasets in CSV (comma separated values) format. This type of data storage is a lightweight solution for the most use cases. In this tutorial you will learn how to read a CSV in R to work with.

How to read a CSV file in R?

In this section you will learn how to import a CSV file in R with the read.csv and read.csv2 functions. You can see the basic syntax of the functions with the most common arguments in the following code block. For additional details remember to type ?read.csv or ?read.csv2.

# Comma as separator and dot as decimal point by default
read.csv(file,                 # File name or full path of the file
         header = TRUE,        # Whether to read the header or not
         sep = ",",            # Separator of the values
         quote = "\"'",        # Quoting character
         dec = ".",            # Decimal point
         fill = TRUE,          # Whether to fill blacks or not
         comment.char = "",    # Character of the comments or empty string 
         encoding = "unknown", # Encoding of the file
         ...)                  # Additional arguments

# Semicolon as separator and comma as decimal point by default
read.csv2(file, header = TRUE, sep = ";", quote = """, dec = ",",
          fill = TRUE, comment.char = "", encoding = "unknown", ...)

You may have noticed that the only difference between the functions are the separator of the values and the decimal separator, due to in some countries they use commas as decimal separator.

In the second case, in order to create CSV files the semicolon is needed if some numbers are decimals. As you may find datasets with both characteristics, you can use the corresponding function instead of changing the parameters of the arguments.

The following table summarizes the three main default arguments:

Function Header Sep Dec
read.csv TRUE “,” “.”
read.csv2 TRUE “;” “,”

In order to load a CSV file in R with the default arguments, you can pass the file as string to the corresponding function. The output will be of class data.frame.

read.csv("my_file.csv")

If you just execute the previous code you will print the data frame but it will not be stored in memory, since you have not assigned it to any variable. If you save it in a variable called my_file, you will be able to access the variables or the data you want.

my_file <- read.csv("my_file.csv")

The file must be on your working directory. If not, you will need to specify the full path of the file in the file argument.

CSV file header

By default, the functions read the header of the files. In case you want to read the CSV without header you will need to set to FALSE the header argument.

read.csv("my_file.csv", header = FALSE)

CSV encoding

A common issue arises with bad encoding of the files. In case you are reading a file with rare characters you maybe need to specify the encoding. Setting the encoding to UTF-8 tends to solve the most of these problems.

read.csv("my_file.csv", encoding = "UTF-8")

Note that this argument and the following are inherited from the read.table function.

Note that this argument and the following are inherited from the read.table function.

The na.strings argument

Sometimes the files contain some character string that represents missing or omitted values. You will find more information about how missing values are handled in the source of the data set you are working with. In order to solve this issue you can convert them to NA values with the na.strings argument, specifying the character string that represents the missing value.

Consider, for instance, that in your CSV file the -9999 values represent missing data. In this scenario you could type:

read.csv("my_file.csv", na.strings = "-9999") 

Moreover, in case the file contains multiple na.strings you can specify all inside a vector.

read.csv("my_file.csv", na.strings = c("-9999" , "Na" )) 

However, if you need to remove NA values or the value specified as it after importing you will need to use the corresponding function depending on your data. The most common function to remove missing values is na.omit.

The stringsAsFactors argument

The stringsAsFactors argument of the function will transform the string (character) columns of the dataset into factors.

read.csv("my_file.csv", stringsAsFactos = TRUE) 

Read multiple CSV files in R

It is worth to mention that it is possible to import multiple CSV files at the same time instead of loading them into R one by one. For that purpose you can use the list.files function in order to look for all CSV files and then read them applying the read.csv (or read.csv2) function with the sapply function.

files <-  list.files(pattern = "*.csv")

multiple_csv <- sapply(files, read.csv)