Stripchart in R
Stripcharts or stripplots are one dimensional scatter charts. When dealing with small sample sizes (few data points), stripcharts in R are alternatives to boxplots or scatter plots to represent the observations. The R stripchart is also useful to overplot the data to other plots, in order to show the distribution of the data. In this tutorial we will explain with examples how to make and customize a stripchart in R programming.
The R stripchart() function
The stripchart
function in R allows you to create one dimensional scatter plots. In order to create a default stripchart, pass a numerical variable to the function:
set.seed(1)
x <- rnorm(20)
stripchart(x)
You can also customize the pch symbol used to create the plot, the line width and its color with the pch
, lwd
and col
arguments, respectively. Note that symbols from 21 to 25 allows you to modify the background color of the symbol with the bg
argument.
stripchart(x, pch = 21, col = 1, bg = 2, lwd = 2)
By default, the function draws a box with tick labels on the X-axis. However, you can remove the box and the axes setting the axes
argument to FALSE
. Then, you could use the axis
function to add the axes you prefer.
stripchart(x, axes = FALSE)
axis(1)
axis(2)
An alternative is to remove the box, setting the argument frame
to FALSE
.
stripchart(x, frame = FALSE)
Flip stripchart axis in R
The R stripchart is drawn in horizontal by default. Nonetheless, you can flip the axes setting the argument vertical
to TRUE
in order to create a plot as the following:
stripchart(x, vertical = TRUE)
In addition, if you specify the argument las = 2
, the tick marks of the vertical axis will be also vertical.
stripchart(x, vertical = TRUE, las = 2)
Chart methods
The stripchart
function has three different methods to draw the data. By default the function uses the method 'overplot'
, which in case of ties overrides the observations. Other methods are 'stack'
, which stacks the observations creating a plot similar to a histogram and the method 'jitter'
, which adds random noise in order to display the observations. Consider, for instance, the following data with ties:
set.seed(1)
x <- round(runif(100, 0 , 10))
Overplot
As we pointed out before, the overplot
method is used by default by the R stripchart
function. Note that in this case, since there exist ties, although there are 100 data points, we can only see 11 in the plot.
stripchart(x, pch = 19, col = 4, main = "method = 'overplot'")
Stack
In order to display all the data you can set the argument method
to 'stack'
. This configuration will stack the repeated data points, creating a plot that represents the distribution of the data.
stripchart(x, pch = 19, method = "stack",
col = 4, main = "method = 'stack'")
Jitter
The last alternative is to use the method 'jitter'
, that adds random noise in the vertical axis if the plot is horizontal or in the X-axis if the plot is vertical, in order to try to show all the data points.
stripchart(x, pch = 19, method = "jitter",
col = 4, main = "method = 'jitter'")
With this method, you can customize the argument jitter
. For values greater (lower) than 0.2 it will increase (decrease) the amount of random noise applied to the points by default.
par(mfrow = c(1, 2))
set.seed(2)
stripchart(x, pch = 19, method = "jitter",
col = 4, main = "method = 'jitter', jitter = 0.2")
axis(2)
set.seed(2)
stripchart(x, pch = 19, method = "jitter", jitter = 0.5,
col = 4, main = "method = 'jitter', jitter = 0.5")
axis(2)
par(mfrow = c(1, 1))
In the previous figure you can observe that the difference between the plots is the scale of the vertical axis. Nonetheless, if there are many data points some of them will overlap. In case you want to avoid the overlapping of the points we recommend you to look for beeswarm charts.
R stripchart by group
With the stripchart
function you can also use a formula of the form y ~ x
, where y
is a numerical variable and x
is a categorical variable or factor representing groups. In order to create an R stripchart by factor you can type:
set.seed(1)
x <- rnorm(100)
groups <- sample(c("A", "B", "C"), 100, replace = TRUE)
stripchart(x ~ groups, group.names = c("A", "B", "C"), pch = 19, method = "jitter",
jitter = 0.2, vertical = TRUE, col = rainbow(length(unique(groups))))
Note that we passed a vector of colors (as many as the number of groups) to the col
argument in order to add colors by group.
Add mean to R stripchart
Sometimes it can be interesting to display the mean of the data points plotted with a strip chart. For a single stripchart you have two options: adding a mean point or a mean line with the points
or abline
functions, respectively.
set.seed(3)
y <- rexp(50)
par(mfrow = c(1, 2))
stripchart(y, pch = 16, col = 5, method = "jitter")
points(mean(y), col = 1, pch = 7, cex = 2, lwd = 2)
stripchart(y, pch = 16, col = 5, method = "jitter")
abline(v = mean(y), col = 1, pch = 7, cex = 2, lwd = 2, lty = 2)
par(mfrow = c(1, 1))
For a stripchart by factor you will need to calculate the mean for each group and add the mean points to each one as follows:
set.seed(3)
y <- rexp(100)
# Generating groups
groups <- sample(c("A", "B", "C"), 100, replace = TRUE)
par(mfrow = c(1, 2))
# Calculating the means
means <- sapply(levels(factor(groups)), function(i) mean(y[groups == i]))
# Horizontal stripchart
stripchart(y ~ groups, pch = 16, col = grey.colors(3),
method = "jitter", las = 1)
points(means, 1:3, col = "red", pch = 7, cex = 1.5, bg = 2, lwd = 2)
# Vertical stripchart
stripchart(y ~ groups, pch = 16, col = grey.colors(3),
method = "jitter", vertical = TRUE)
points(means, col = "red", pch = 7, cex = 1.5, bg = 2, lwd = 2)
par(mfrow = c(1, 1))
Add a stripchart to a boxplot
Finally, it is worth to mention that stripcharts are commonly displayed over boxplots when there are few observations, due to boxplots hide the underlying distribution of the data. In order to add a stripchart over a plot you have to create a plot and then specify add = TRUE
on the stripchart
function.
set.seed(4)
z <- rnorm(75)
par(mfrow = c(1, 2))
# Single boxplot
boxplot(z)
stripchart(z, add = TRUE, vertical = TRUE,
method = "jitter", col = 2, pch = 19)
# Boxplot by group
g <- sample(c("A", "B"), 75, replace = TRUE)
boxplot(z ~ g)
stripchart(z ~ g, add = TRUE, vertical = TRUE,
method = "jitter", col = 3:4, pch = 19)
par(mfrow = c(1, 1))
Note that the argument at
allows you to modify the position to draw the stripchart, so it is not strictly needed to override the main plot.
set.seed(4)
z <- rnorm(75)
boxplot(z, horizontal = TRUE, xlim = c(0.5, 2))
stripchart(z, add = TRUE, method = "jitter", col = 2, pch = 19, at = 1.75)