Read Excel files in R
Excel is a spreadsheet developed by Microsoft, which allows you to manage data in a very simple way. Until 2007, the XLS was the main file extension. However, in the 2007 release the XLSX (XML-based) extension was introduced to become the default workbook format. In this tutorial you will learn how to read Excel files into R and RStudio with several packages.
How to import Excel files into R?
If you need to read an Excel in R, you will need to use a specific package to achieve it. There are several options, but the best packages for reading Excel files could be openxlsx
and readxl
, as they don’t depend on JAVA (unlike xlsx
and XLConnect
packages) nor Perl (gdata
package).
Note that external dependencies can cause errors when loading the packages, but for huge datasets they should be faster than the other alternatives.
Import Excel data into RStudio from the menu
If you are using RStudio you can go to File → Import Dataset → From Excel…. Then, you can browse your Excel file and customize the output (the name of the variable, the sheet, cells range, …). You can also see a preview of the code that will be executed in the backend and of the data that will be loaded:
Note that, with this approach, you will need to have installed the readxl
package.
Read XLSX without JAVA in R: readxl and openxlsx
readxl package
The readxl
package is part of the tidyverse
package, created by Hadley Wickham (chief scientist at RStudio) and his team. This package supports XLS via the libxls
C library and XLSX files via the RapidXML
C++ library without using external dependencies.
The package provides some Excel (XLS and XLSX) files stored in the installation folder of the package, so in order to create a reproducible example, in the following examples we are going to use the clippy.xlsx
file, which first sheet is as follows:
In order to load the path of the sample Excel file you can make use of the readxl_example
function. Once loaded, or once you have the path of your own Excel file, you can use the excel_sheets
function to check the Excel file sheet names, if needed.
The generic function of the package to read Excel files into R is the read_excel
function, which guesses the file type (XLS or XLSX) depending on the file extension and the file itself.
The sheet
argument allows you to specify the sheet you want to load, passing its name or the corresponding number of the tab. Note that, by default, the function loads the first Excel sheet.
You can also skip rows with the skip
argument of the function:
Note that you could also specify a range of cells to be selected with the range
argument. In this case, the skip
argument won’t be taken into account if you specify it.
In addition, if you want to avoid reading the column names, you can set the col_names
argument to FALSE
:
However, you may have noticed that the output is of class tibble
(a modern type of data frame). If you want the output to be of class data.frame
you will need to use the as.data.frame
function as follows:
Recall that the read_excel
function guesses the file extension. Nonetheless, if you know the file extension you are going to read you can use the corresponding function of the following to avoid guessing:
openxlsx package
The openxlsx
package uses Rcpp
and, as it doesn’t depend on JAVA, it is an interesting alternative to to the readxl
package to read an Excel file in R. The differences respect to the previous package are that the output is of class data.frame
by default instead of tibble
and that its main use is not just importing Excel files, as it also provides a wide variety of functions to write, style and edit Excel files.
The function to read XLSX files is named read.xlsx
:
As in the function of the previous package, there are several arguments you can customize, as sheet
, skip
or colNames
. If you want to select specific cells you can make use of the rows
and cols
arguments. Recall to type ?read.xlsx
or help(read.xlsx)
for additional information.
The xlsx package
Although this package requires JAVA installed on your computer it is a very popular alternative. The main functions to import Excel files are the read.xlsx
and read.xlsx2
. The second has slightly differences in the default arguments and it does more work in JAVA, achieving better performance.
You can customize several arguments as sheetIndex
, sheetName
, header
, rowIndex
, colIndex
, among others. Run ?read.xlsx
or help(read.xlsx)
for additional details.
XLConnect package
An alternative to the xlsx
package is XLConnect
, which allows writing, reading and formatting Excel files. In order to load an Excel file into R you can use the readWorksheetFromFile
function as follows. We recommend you to type ??XLConnect
to look for additional information of the arguments of each function of the package.
In case you want to load multiple sheets, it is recommended to use the loadWorkbook
function and then load each sheet with the readWorksheet
function:
Moreover, this package provides a function to load Excel named regions. Analogous to the previous example, you can import just a region with the readNamedRegionFromFile
, specifying the file name (if the file is in your working directory) or the file path and the region name.
If you want to load multiple named regions you can load the workbook with the loadWorkbook
function and then import each region with the readNamedRegion
function.
It is worth to mention that if you are experiencing issues with the packages that require JAVA you can get and set the path of JAVA in R with the following codes:
Note that you will need to specify the path to the jre
folder inside the Java
folder of your computer, which you should find inside Program Files
.
Convert XLSX files to CSV in R
Finally, you could also convert your Excel files into a CSV format and read the CSV file in R. For this purpose, you can use the convert
function of the rio
package. An alternative would be saving directly the Excel file as CSV with the menu of Microsoft Excel.