How to import Excel files into R?
If you need to read an Excel in R, you will need to use a specific package to achieve it. There are several options, but the best packages for reading Excel files could be
readxl, as they don’t depend on JAVA (unlike
XLConnect packages) nor Perl (
If you are using RStudio you can go to File → Import Dataset → From Excel.... Then, you can browse your Excel file and customize the output (the name of the variable, the sheet, cells range, …). You can also see a preview of the code that will be executed in the backend and of the data that will be loaded:
Read XLSX without JAVA in R: readxl and openxlsx
readxl package is part of the
tidyverse package, created by Hadley Wickham (chief scientist at RStudio) and his team. This package supports XLS via the
libxls C library and XLSX files via the
RapidXML C++ library without using external dependencies.
The package provides some Excel (XLS and XLSX) files stored in the installation folder of the package, so in order to create a reproducible example, in the following examples we are going to use the
clippy.xlsx file, which first sheet is as follows:
In order to load the path of the sample Excel file you can make use of the
readxl_example function. Once loaded, or once you have the path of your own Excel file, you can use the
excel_sheets function to check the Excel file sheet names, if needed.
# install.packages("readxl") library(readxl) # Get the path of a sample XLSX dataset of the package file_path <- readxl_example("clippy.xlsx") # Check the Sheet names of the Excel file excel_sheets(file_path) # "list-column" "two-row-header"
The generic function of the package to read Excel files into R is the
read_excel function, which guesses the file type (XLS or XLSX) depending on the file extension and the file itself.
# A tibble: 4 x 2 name value <chr> <chr> 1 Name Clippy 2 Species paperclip 3 Approx date of death 39083 4 Weight in grams 0.9
sheet argument allows you to specify the sheet you want to load, passing its name or the corresponding number of the tab. Note that, by default, the function loads the first Excel sheet.
# Selecting the other sheet of the Excel file read_excel(file_path, sheet = "two-row-header") read_excel(file_path, sheet = 2) # Equivalent
# A tibble: 2 x 4 name species death weight <chr> <chr> <chr> <chr> 1 (at birth) (office supply type) (date is approximate) (in grams) 2 Clippy paperclip 39083 0.9
You can also skip rows with the
skip argument of the function:
# Skip first row read_excel(file_path, skip = 1)
# A tibble: 3 x 2 Name Clippy <chr> <chr> 1 Species paperclip 2 Approx date of death 39083 3 Weight in grams 0.9
Note that you could also specify a range of cells to be selected with the
range argument. In this case, the
skip argument won’t be taken into account if you specify it.
read_excel(file_path, range = "B1:B5")
# A tibble: 4 x 1 value <chr> 1 Clippy 2 paperclip 3 39083 4 0.9
In addition, if you want to avoid reading the column names, you can set the
col_names argument to
read_excel(file_path, col_names = FALSE)
New names: * `` -> ...1 * `` -> ...2 ...1 ...2 1 name value 2 Name Clippy 3 Species paperclip 4 Approx date of death 39083 5 Weight in grams 0.9
However, you may have noticed that the output is of class
tibble (a modern type of data frame). If you want the output to be of class
data.frame you will need to use the
as.data.frame function as follows:
data <- read_excel(file_path, skip = 1) as.data.frame(data)
Name Clippy 1 Species paperclip 2 Approx date of death 39083 3 Weight in grams 0.9
Recall that the
read_excel function guesses the file extension. Nonetheless, if you know the file extension you are going to read you can use the corresponding function of the following to avoid guessing:
# If you know the extension of your Excel file # use one of these functions instead # For XLS files read_xls() # For XLSX files read_xlsx()
openxlsx package uses
Rcpp and, as it doesn’t depend on JAVA, it is an interesting alternative to to the
readxl package to read an Excel file in R. The differences respect to the previous package are that the output is of class
data.frame by default instead of
tibble and that its main use is not just importing Excel files, as it also provides a wide variety of functions to write, style and edit Excel files.
The function to read XLSX files is named
# install.packages("openxlsx") library(openxlsx) read.xlsx(file_path)
name value 1 Name Clippy 2 Species paperclip 3 Approx date of death 39083 4 Weight in grams 0.9
As in the function of the previous package, there are several arguments you can customize, as
colNames. If you want to select specific cells you can make use of the
cols arguments. Recall to type
help(read.xlsx) for additional information.
read.xlsx(file_path, cols = 1:2, rows = 2:3)
Name Clippy 1 Species paperclip
The xlsx package
Although this package requires JAVA installed on your computer it is very popular. The main functions to import Excel files are the
read.xlsx2. The second has slightly differences in the default arguments and it does more work in JAVA, achieving better performance.
# install.packages("xlsx") library(xlsx) read.xlsx(file_path) read.xlsx2(file_path)
You can customize several arguments as
colIndex, among others. Run
help(read.xlsx) for addition details.
An alternative to the
xlsx package is
XLConnect, which allows writing, reading and formatting Excel files. In order to load an Excel file into R you can use the
readWorksheetFromFile function as follows. We recommend you to type
??XLConnect to look for additional information of the arguments of each function of the package.
# install.packages("XLConnect") library(XLConnect) data <- readWorksheetFromFile(file_path, sheet = "list-column", startRow = 1, endRow = 5, startCol = 1, endCol = 2)
In case you want to load multiple sheets, it is recommended to use the
loadWorkbook function and then load each sheet with the
load <- loadWorkbook(file_path) data <- readWorksheet(load, sheet = "list-column", startRow = 1, endRow = 5, startCol = 1, endCol = 2) data2 <- readWorksheet(load, sheet = "two-row-header", startRow = 1, endRow = 3, startCol = 1, endCol = 4)
Moreover, this package provides a function to load Excel named regions. Analogous to the previous example, you can import just a region with the
readNamedRegionFromFile, specifying the file name (if the file is in your working directory) or the file path and the region name.
data <- readNamedRegionFromFile(file, # File path name, # Region name ...) # Arguments of readNamedRegion()
If you want to load multiple named regions you can load the workbook with the
loadWorkbook function and then import each region with the
load <- loadWorkbook(file_path) data <- readNamedRegion(load, name_Region_1, ...) data2 <- readNamedRegion(load, name_Region_2, ...)
It is worth to mention that if you are experiencing issues with the packages that require JAVA you can get and set the path of JAVA in R with the following codes:
Note that you will need to specify the path to the
# Prints the path of JAVA Home in R Sys.getenv("JAVA_HOME") # Sets the path of JAVA Sys.setenv(JAVA_HOME = "path_to_jre_java_folder")
jrefolder inside the
Javafolder of your computer, which you should find inside
Convert XLSX files to CSV in R
Finally, you could also convert your Excel files into a CSV format and read the CSV file in R. For this purpose, you can use the
convert function of the
rio package. An alternative would be saving directly the Excel file as CSV with the menu of Microsoft Excel.
# install.packages("rio") library(rio) convert(file_path, "file.csv")