Read SQL databases in R
The DBI
package is an interface to communicate R and relational database management systems. This package can connect to MySQL, PostgreSQL, SQLite, MariaDB and other databases through their own drivers. In this tutorial you will learn how to connect to different data bases, how to perform queries and how to write and remove tables.
SQL database connections
The dbConnect
function from DBI
creates a connection to a data base through a driver and the necessary authentication parameters, such as the user name, password, host, port and database name or database file name. In the following subsections we will review how to connect to the most common databases.
MySQL
To connect to a MySQL database you have to specify the MySQL driver from RMySQL
and specify your authentication parameters.
library(DBI)
db <- dbConnect(RMySQL::MySQL(),
user = "your_user_name",
host = "your_host",
password = "your_secret_password",
dbname = "your_database_name",
port = "your port")
PostgreSQL
In order to connect to a PostgreSQL data base you will need to use RPostgreSQL::PostgreSQL()
and your authentication parameters.
library(DBI)
db <- dbConnect(RPostgreSQL::PostgreSQL(),
user = "your_user_name",
host = "your_host",
password = "your_secret_password",
dbname = "your_database_name",
port = "your port")
MariaDB
A connection to MariaDB can be done through the RMariaDB::MariaDB()
driver. You can connect to a remote database with your credentials or setting a group in your .my.cnf
, mariadb.cnf
or my.ini
file (the default MariaDB option file).
library(DBI)
db <- dbConnect(RMariaDB::MariaDB(), group = "my-db")
# Or:
db <- dbConnect(RMariaDB::MariaDB(),
host = "your-db.your-company.com",
user = "your_user_name",
password = "your_secret_password")
SQLite
The last connection we will review is to SQLite through RSQLite::SQLite()
. In this scenario you just need to specify the name of your database.
library(DBI)
db <- dbConnect(RSQLite::SQLite(), "your-db.sqlite")
# In-memory database
# db <- dbConnect(RSQLite::SQLite(), ":memory:")
You can check the connections with dbListConnections(dvr)
. Unfortunately this is a deprecated function and not all drivers implement this method and you will have to keep track of your connections by yourself.
Read tables
In this section we are going to use the following SQLite sample database named db
that contains two tables. You can use the following code to create it under your working directory:
# Sample data
df <- data.frame(x = 1:26, LETTERS)
db <- dbConnect(RSQLite::SQLite(), dbname = 'sample_db.sqlite')
# Sample tables
dbWriteTable(conn = db, name = "first_table", value = df)
dbWriteTable(conn = db, name = "second_table", value = mtcars)
List available tables
You can list the available tables with the dbListTables
function.
dbListTables(db)
"first_table" "second_table"
As stated before, this database has two tables named "first_table"
and "second_table"
.
List fields on a table
You can also list the fields of each of the tables with dbListFields
.
dbListFields(db, "first_table")
"x" "LETTERS"
For instance, the first table contains two fields named "x"
and "LETTERS"
.
Read specific tables with dbReadTable
Once you know the names of the table of your databases you can read them with the dbReadTable
function. In the following example we are reading the table named "first_table"
.
df <- dbReadTable(db, "first_table")
df
x LETTERS
1 1 A
2 2 B
3 3 C
4 4 D
5 5 E
...
Send queries
It is possible to execute SQL queries to the database with the dbSendQuery
function. Then, you will need to use the dbFetch
function to fetch the desired records of the query. Finally, you will need to free the resources with dbClearResult
.
In the following example we are extracting 5 rows of the field named x
from the table named first_table
:
# Select the variable "x" from "first_table"
query <- dbSendQuery(db, "SELECT x FROM first_table")
# First 5 rows
n5 <- dbFetch(query, n = 5)
n5
# Clear
dbClearResult(query)
You can learn more about SQL and SQL queries in the following site: https://sqlearning.com/
Write and remove tables
The dbWriteTable
and dbRemoveTable
functions allow to add or remove tables from the database. If you want to add a new table named third_table
to the database you can do the following:
df <- iris[1:10, ]
dbWriteTable(db, "third_table", df)
Now you can check that the table was added to the database with dbListTables
.
dbListTables(db)
"first_table" "second_table" "third_table"
Lastly, consider that you want to remove the second table. For that purpose you can type the following:
dbRemoveTable(db, "second_table")
Close a connection
When you finish your work you can close your connection to the database with the dbDisconnect
function. If you were working with a temporary database it will be deleted.
dbDisconnect(db)
You can also close all connections listing all connections with dbListConnections
and closing all at once following the example below. This is only available for some drivers as dbListConnections
is a deprecated function.
# Specify the driver
dvr <- RMySQL::MySQL()
lapply(dbListConnections(dvr), dbDisconnect)