Read Excel files in R

Data Manipulation in R Import and export data
Read Excel file sheets into R

Excel is a spreadsheet developed by Microsoft, which allows you to manage data in a very simple way. Until 2007, the XLS was the main file extension. However, in the 2007 release the XLSX (XML-based) extension was introduced to become the default workbook format. In this tutorial you will learn how to read Excel files into R and RStudio with several packages.

How to import Excel files into R?

If you need to read an Excel in R, you will need to use a specific package to achieve it. There are several options, but the best packages for reading Excel files could be openxlsx and readxl, as they don’t depend on JAVA (unlike xlsx and XLConnect packages) nor Perl (gdata package).

Note that external dependencies can cause errors when loading the packages, but for huge datasets they should be faster than the other alternatives.

Import Excel data into RStudio from the menu

If you are using RStudio you can go to FileImport DatasetFrom Excel…. Then, you can browse your Excel file and customize the output (the name of the variable, the sheet, cells range, …). You can also see a preview of the code that will be executed in the backend and of the data that will be loaded:

Import Excel file into RStudio

Read Excel file in RStudio with the menu

Note that, with this approach, you will need to have installed the readxl package.

Read XLSX without JAVA in R: readxl and openxlsx

readxl package

The readxl package is part of the tidyverse package, created by Hadley Wickham (chief scientist at RStudio) and his team. This package supports XLS via the libxls C library and XLSX files via the RapidXML C++ library without using external dependencies.

The package provides some Excel (XLS and XLSX) files stored in the installation folder of the package, so in order to create a reproducible example, in the following examples we are going to use the clippy.xlsx file, which first sheet is as follows:

Sample XLSX in R

In order to load the path of the sample Excel file you can make use of the readxl_example function. Once loaded, or once you have the path of your own Excel file, you can use the excel_sheets function to check the Excel file sheet names, if needed.

# install.packages("readxl")
library(readxl)

# Get the path of a sample XLSX dataset of the package
file_path <- readxl_example("clippy.xlsx")

# Check the Sheet names of the Excel file
excel_sheets(file_path) # "list-column" "two-row-header"

The generic function of the package to read Excel files into R is the read_excel function, which guesses the file type (XLS or XLSX) depending on the file extension and the file itself.

read_excel(file_path)
# A tibble: 4 x 2
  name                 value    
  <chr>                <chr>    
1 Name                 Clippy   
2 Species              paperclip
3 Approx date of death 39083    
4 Weight in grams      0.9 

The sheet argument allows you to specify the sheet you want to load, passing its name or the corresponding number of the tab. Note that, by default, the function loads the first Excel sheet.

# Selecting the other sheet of the Excel file
read_excel(file_path, sheet = "two-row-header")
read_excel(file_path, sheet = 2) # Equivalent
# A tibble: 2 x 4
  name       species              death                 weight    
  <chr>      <chr>                <chr>                 <chr>     
1 (at birth) (office supply type) (date is approximate) (in grams)
2 Clippy     paperclip            39083                 0.9 

You can also skip rows with the skip argument of the function:

# Skip first row
read_excel(file_path, skip = 1)
# A tibble: 3 x 2
  Name                 Clippy   
  <chr>                <chr>    
1 Species              paperclip
2 Approx date of death 39083    
3 Weight in grams      0.9  

Note that you could also specify a range of cells to be selected with the range argument. In this case, the skip argument won’t be taken into account if you specify it.

read_excel(file_path, range = "B1:B5")
# A tibble: 4 x 1
  value    
  <chr>    
1 Clippy   
2 paperclip
3 39083    
4 0.9 

In addition, if you want to avoid reading the column names, you can set the col_names argument to FALSE:

read_excel(file_path, col_names = FALSE)
New names:
* `` -> ...1
* `` -> ...2
                  ...1      ...2
1                 name     value
2                 Name    Clippy
3              Species paperclip
4 Approx date of death     39083
5      Weight in grams       0.9

However, you may have noticed that the output is of class tibble (a modern type of data frame). If you want the output to be of class data.frame you will need to use the as.data.frame function as follows:

data <- read_excel(file_path, skip = 1)
as.data.frame(data)
                  Name    Clippy
1              Species paperclip
2 Approx date of death     39083
3      Weight in grams       0.9

Recall that the read_excel function guesses the file extension. Nonetheless, if you know the file extension you are going to read you can use the corresponding function of the following to avoid guessing:

# If you know the extension of your Excel file
# use one of these functions instead

# For XLS files
read_xls()

# For XLSX files
read_xlsx()

openxlsx package

The openxlsx package uses Rcpp and, as it doesn’t depend on JAVA, it is an interesting alternative to to the readxl package to read an Excel file in R. The differences respect to the previous package are that the output is of class data.frame by default instead of tibble and that its main use is not just importing Excel files, as it also provides a wide variety of functions to write, style and edit Excel files.

The function to read XLSX files is named read.xlsx:

# install.packages("openxlsx")
library(openxlsx)

read.xlsx(file_path)
                  name     value
1                 Name    Clippy
2              Species paperclip
3 Approx date of death     39083
4      Weight in grams       0.9

As in the function of the previous package, there are several arguments you can customize, as sheet, skip or colNames. If you want to select specific cells you can make use of the rows and cols arguments. Recall to type ?read.xlsx or help(read.xlsx) for additional information.

read.xlsx(file_path, cols = 1:2, rows = 2:3)
     Name    Clippy
1 Species paperclip

The xlsx package

Although this package requires JAVA installed on your computer it is a very popular alternative. The main functions to import Excel files are the read.xlsx and read.xlsx2. The second has slightly differences in the default arguments and it does more work in JAVA, achieving better performance.

# install.packages("xlsx")
library(xlsx)

read.xlsx(file_path)
read.xlsx2(file_path)

You can customize several arguments as sheetIndex, sheetName, header, rowIndex, colIndex, among others. Run ?read.xlsx or help(read.xlsx) for additional details.

XLConnect package

An alternative to the xlsx package is XLConnect, which allows writing, reading and formatting Excel files. In order to load an Excel file into R you can use the readWorksheetFromFile function as follows. We recommend you to type ??XLConnect to look for additional information of the arguments of each function of the package.

# install.packages("XLConnect")
library(XLConnect)

data <- readWorksheetFromFile(file_path, sheet = "list-column",
                              startRow = 1, endRow = 5,
                              startCol = 1, endCol = 2)

In case you want to load multiple sheets, it is recommended to use the loadWorkbook function and then load each sheet with the readWorksheet function:

load <- loadWorkbook(file_path)

data <- readWorksheet(load, sheet = "list-column",
                      startRow = 1, endRow = 5,
                      startCol = 1, endCol = 2)

data2 <- readWorksheet(load, sheet = "two-row-header",
                       startRow = 1, endRow = 3,
                       startCol = 1, endCol = 4)

Moreover, this package provides a function to load Excel named regions. Analogous to the previous example, you can import just a region with the readNamedRegionFromFile, specifying the file name (if the file is in your working directory) or the file path and the region name.

data <- readNamedRegionFromFile(file, # File path
                                name, # Region name
                                ...)  # Arguments of readNamedRegion()

If you want to load multiple named regions you can load the workbook with the loadWorkbook function and then import each region with the readNamedRegion function.

load <- loadWorkbook(file_path)

data <- readNamedRegion(load, name_Region_1, ...)
data2 <- readNamedRegion(load, name_Region_2, ...)

It is worth to mention that if you are experiencing issues with the packages that require JAVA you can get and set the path of JAVA in R with the following codes:

# Prints the path of JAVA Home in R
Sys.getenv("JAVA_HOME")

# Sets the path of JAVA
Sys.setenv(JAVA_HOME = "path_to_jre_java_folder")

Note that you will need to specify the path to the jre folder inside the Java folder of your computer, which you should find inside Program Files.

Convert XLSX files to CSV in R

Finally, you could also convert your Excel files into a CSV format and read the CSV file in R. For this purpose, you can use the convert function of the rio package. An alternative would be saving directly the Excel file as CSV with the menu of Microsoft Excel.

# install.packages("rio")
library(rio)

convert(file_path, "file.csv")