Violin plot in R

Learn how to create violin plots in R with vioplot package

Violin plots are an alternative to box plots that solves the issues regarding displaying the underlying distribution of the observations, as these plots show a kernel density estimate of the data. In this tutorial, we will show you how to create a violin plot in base R from a vector and from data frames, how to add mean points and split the R violin plots by group.

Vioplot from vector

In order to create a violin plot in R from a vector, you need to pass the vector to the vioplot function of the package of the same name. Consider, for instance, the following vector:

x <- c(6, 9, 0, 19, -1, 8, 12, 5, 3, 7,
       2, 4, 3, -8, -9, 8, 4, 12, 5, 14)

You can create a simple violin plot in R typing:

# install.packages("vioplot")


Example of the vioplot package

By default, the vioplot function will create a vertical violin plot in R, but if you set the argument horizontal to TRUE, you can create a horizontal violin plot.

vioplot(x, horizontal = TRUE)

Horizontal violin plot in R

If you want to customize the violin plot, there are several arguments to control the graphical representation:

        col = 2,               # Color of the area
        rectCol = "red",       # Color of the rectangle
        lineCol = "white",     # Color of the line
        colMed = "green",      # Pch symbol color
        border = "black",      # Color of the border of the violin
        pchMed = 16,           # Pch symbol for the median
        plotCentre = "points") # If "line", plots a median line

Customizing the graphical representation created with the vioplot function

In addition, you can add jittered data points to a violin plot with the stripchart function as follows:

stripchart(x, method = "jitter", col = "blue",
           vertical = TRUE, pch = 19, add = TRUE)

stripchart over R violin plot

Note that if you have a horizontal violin plot, you will need to set vertical = FALSE in the previous function.

Moreover, you can draw a violin plot in R without taking into account the outliers of the data. For that purpose, you can assign to a variable the output of the boxplot function and then return the values of the original vector that are not outliers.

box <- boxplot(x)

x <- x[!(x %in% box$out)]


vioplot function in R without outliers

You can also set the argument ylog to TRUE if you want the Y-axis to be in logarithmic scale. Note that this only will work for positive data.

par(mfrow = c(1, 2))

vioplot(1:10, ylog = TRUE)

par(mfrow = c(1, 1))

Y axis scale of the vioplot function in R

Histogram and violin plot

Finally, note that you can plot a violin plot over a histogram. Consider, for instance, that the underlying distribution of your data presents multimodality. In this case, a boxplot won’t represent this condition, but the violin plot will do. The following graphical representation will help you understand why a violin plot is useful:


# Multimodal data
n <- 10000
ii <- rbinom(n, 1, 0.5)
data <- rnorm(n, mean = 130, sd = 10) * ii +
        rnorm(n, mean = 80, sd = 5) * (1 - ii)

# Histogram
hist(data, probability = TRUE, col = "grey", axes = FALSE,
     main = "", xlab = "",  ylab = "")

# X-axis

# Density
lines(density(data), lwd = 2, col = "red")

# Add violin plot
par(new = TRUE)
vioplot(data, horizontal = TRUE, yaxt = "n", axes = FALSE,
        col = rgb(0, 1, 1, alpha = 0.15))

Histogram and violin plot in R

Violin plot by group

On the one hand, if you have a data frame with a variable containing groups, you can draw a violin plot from a formula, specifying the numerical variable against the factor. We will show you an example using the chickwts dataset of base R.

tail(chickwts) # Last rows
    weight  feed
66    352  casein
67    359  casein
68    216  casein
69    222  casein
70    283  casein
71    332  casein

Now, you can specify the formula on the first argument, the colors and any desired graphical parameter:

data <- chickwts

vioplot(data$weight ~ data$feed, col = 2:length(levels(data$feed)),
        xlab = "Feed", ylab = "Weight")

Create a violin plot by group in R with a formula

You can also add jittered data points to the previous violin plot with the stripchart function as follows:

stripchart(data$weight ~ data$feed, vertical = TRUE, method = "jitter",
           pch = 19, add = TRUE, col = 3:8)

Add jittered data points to the R violin chart

On the other hand, if your data set contains numeric columns that represents some variable, you can directly create the violin plot from the data frame. We will use, for instance, the trees dataset of base R.

tail(trees) # Last rows
   Girth  Height  Volume
26  17.3    81     55.4
27  17.5    82     55.7
28  17.9    80     58.3
29  18.0    80     51.5
30  18.0    80     51.0
31  20.6    87     77.0

If you pass the dataframe to the vioplot function, you can create the plot. Note that if you stack this data frame with the stack function, you can specify a formula as in the previous example.

data <- trees

vioplot(data, col = 2:4, border = 2:4)

# Equivalent to:
stacked_data <- stack(trees)
vioplot(stacked_data$values ~ stacked_data$ind, col = 2:4,
        border = 2:4)

Violin plot from data frame in R

Reorder violin plot

The violin plots are ordered by default by the order of the levels of the categorical variable. Recall the violin plot we created before with the chickwts dataset and check that the order of the variables is the following:

 "casein"  "horsebean"  "linseed"  "meatmeal"  "soybean"  "sunflower"

However, you can override this behavior reordering the categorical variable by any characteristic of the data with the reorder function. In the following example we are going to use the median, but you could choose any function you want.

par(mfrow = c(1, 2))

data <- chickwts

# Lower to higher

medians <- reorder(data$feed, data$weight, median)
# medians <- with(data, reorder(feed, weight, median)) # Equivalent

vioplot(data$weight ~ medians, col = 2:(length(levels(data$feed)) + 1),
        xlab = "", ylab = "Weight", las = 2)

# Higher to lower

medians <- reorder(data$feed, -data$weight, median)
# medians <- with(data, reorder(feed, -weight, median)) # Equivalent

vioplot(data$weight ~ medians, col = 2:(length(levels(data$feed)) + 1),
        xlab = "", ylab = "Weight", las = 2)

par(mfrow = c(1, 1))

Reorder violin plot in R

Add mean to base R violin plot

The vioplot function displays the median of the data, but if the distribution is not symmetric the mean and the median can be very distant. Hence, you can add the mean point, or any other characteristic of the data, to a violin plot in base R with the points function. Note that the steps are different if you are plotting a horizontal or vertical violin plot and single or multiple plots.

On the one hand, to display the mean point of a single violin plot you can type:

par(mfrow = c(1, 2))

# Exponential data
x <- rexp(20)

# Vertical vioplot

vioplot(x, col = 4)

# Add mean point
points(mean(x), pch = 19, col = "green", cex = 1.5)

# Horizontal vioplot

vioplot(x, col = 4, horizontal = TRUE)

# Add mean point
points(mean(x), 1, pch = 19, col = "green", cex = 1.5)

legend("topright", pch = c(21, 19), col = c("black", "green"),
       bg = "white", legend = c("Median", "Mean"), cex = 1.25)

par(mfrow = c(1, 1))

Add mean points to single vioplot

On the other hand, you can add mean points to a violin plot by group typing the following:

par(mfrow = c(1, 2))

df <- data.frame(x = rexp(20), y = rexp(20), z = rexp(20))

# Vertical vioplot by group

vioplot(df, col = 2:4)

# Add mean points
means <- apply(df, 2, mean)
means <- colMeans(df) # Equivalent (more efficient)

points(means, pch = 19, col = "green", cex = 1.25)

legend("top", pch = c(21, 19), col = c("black", "green"),
       bg = "white", legend = c("Median", "Mean"), cex = 1.25)

# Horizontal vioplot by group

vioplot(df, col = 2:4,
        horizontal = TRUE)

# Add mean points
means <- apply(df, 2, mean)
means <- colMeans(df) # Equivalent (more efficient)

points(means, 1:ncol(df), pch = 19, col = "green", cex = 1.25)

par(mfrow = c(1, 1))

Add mean points to vioplot by group

You can add points of other characteristic of the data changing the mean function for other.

Split R vioplots

It is worth to mention that you can split a violin plot in R. Consider, for instance, that you have divided the trees dataset into two groups, representing tall and small trees, depending on its height. Then, you can make use of the side and add arguments as follows:

data <- trees

tall <- trees[trees$Height >= 76, ]
small <- trees[trees$Height < 76, ]

vioplot(tall, side = "left", plotCentre = "line", col = 2)
vioplot(small, side = "right", plotCentre = "line", col = 3, add = TRUE)

legend("topleft", legend = c("Tall", "Small"), fill = c(2, 3), cex = 1.25)

split violin plot in R

Even possible, it is recommended to plot median lines instead of points for split violin plots.