Chapter 2: Customizing graphs in ggplot2

Learn how to customize the aesthetics, labels and axes of a graph in ggplot2.

Jewel Johnson
12-04-2021

After Chapter 1 you must be familiar with the different types of graphs that you can plot using ggplot2. So for this tutorial, we will be learning how to customize those ggplot graphs to our liking. We will learn how to tweak the aesthetics, how to change labels and how to modify and change the axes in a graph.

So let us plot a graph from scratch and learn how to use different aesthetics available.

1. Setting up the prerequisites

First, we need to install the ggplot2 package in R as it does not come in the standard distribution of R. For the dataset, we will first download the Stat2Data package which houses a lot of cool datasets. For this tutorial let us use the Hawks dataset which showcases body measurements from three different species of Hawks. This data was collected by students and faculty at Cornell College in Mount Vernon and the dataset was made available by late Prof. Bob Black at Cornell College.

install.packages("ggplot2")
install.packages("Stat2Data")
library(ggplot2)
library(Stat2Data)
data("Hawks")
str(Hawks)

So there is a lot of information in the dataset which we can use for plotting. So let us try plotting them.

2. Building a plot

One thing to remember here is that how ggplot2 builds a graph is by adding layers. Let us start by plotting the basic layer first where the x-axis shows ‘weight of the hawks’ and the y-axis shows ‘wingspan of the hawks’.

ggplot(data = Hawks, aes(x = Weight, y = Wing))

Wait a sec! Where are my data points? So right now if we look at the syntax of the ggplot code we can see that we have not told ggplot2 which geometry we want. Do we want a scatter plot or a histogram or any other type of graph? So let us plot a scatter plot first. Use geom_point() command. By adding geom_point() to the ggplot() command is equivalent to adding an extra layer to the already existing layer that we got previously. Let us also use theme_bw() for a nice looking theme.

ggplot(data = Hawks, aes(x = Weight, y = Wing)) + geom_point() + 
  theme_bw()
Warning: Removed 11 rows containing missing values (geom_point).

We got the graph! but we also got a warning message. The warning message tells us that the dataset which we had used to plot the graph had 11 rows of NA values and which could not be plotted into the graph. In real-life cases, we can have datasets with NA values due to various reasons, so this is fine.

Now, this graph even though shows us data points we are not sure which point belongs to which species, as this dataset contains data for three species of Hawks. So let us try giving different colours to the points concerning the different species so that we are able to differentiate them.

1. Changing colour

ggplot(data = Hawks, aes(x = Weight, y = Wing, colour = Species)) +
  geom_point() + theme_bw()

The species abbreviations are the following: CH=Cooper’s, RT=Red-tailed, SS=Sharp-Shinned.

Now, this graph is way better than the previous one.

2. Changing point shape.

ggplot(data = Hawks, aes(x = Weight, y = Wing, shape = Species)) + #instead of colour use shape.
  geom_point() + theme_bw() 

Now we did change the shape of points but it is still hard to make out the difference. Let us try specifying colour along with the shape

ggplot(data = Hawks, aes(x = Weight, y = Wing, 
  colour = Species, shape = Species)) + geom_point() + theme_bw()

This plot is much better than the previous one.

Now let us try specifying colour within the aes() of the geom()

ggplot(data = Hawks, aes(x = Weight, y = Wing)) + 
  geom_point(aes(colour = Species)) + theme_bw()

We got the same graph as before! So what is the difference in specifying colour within aes() of ggplot() compared to the same but within geom(). Let us look at another example.

ggplot(data = Hawks, aes(x = Weight, y = Wing, colour = "red")) + 
  geom_point() + theme_bw()

I manually changed the colour of the points to red colour. Now let try specifying colour to the aes() within the geom()

ggplot(data = Hawks, aes(x = Weight, y = Wing, colour = "red")) + 
  geom_point(aes(colour = Species)) + theme_bw()

You can see that the red colour is overridden by other colours. So the aes() mapping (in this case colour) within geom() will override any aes() mapping within ggplot(). And whatever aes() mapping we give within ggplot() will be inherited by all geom() layers that will be added along with the ggplot() layer.

Let us see another case.

Show code
ggplot(data = Hawks, aes(x = Weight, y = Wing)) + 
  geom_point(colour = "darkred") + theme_bw()

Show code
ggplot(data = Hawks, aes(x = Weight, y = Wing)) + 
  geom_point(aes(colour = "darkred")) + theme_bw()

If you compare both the codes, the only difference is that the colour = "darkred" command was outside aes() in the first code and inside aes() in the second code. So why didn’t the second graph have the same dark-red coloured points as the first one? The reason is that in the first code we are explicitly told to have all data points to be coloured dark-red but that is not the case with the second code. In the second code, since we have specified it inside aes(), ggplot is trying to look for a variable called “darkred” inside the dataset and colour it accordingly. This is why the legend that appears in the second graph has listed “darkred” as a category. And ggplot fails to find the variable called “darkred” but it still recognizes the colour command line and colour all the points in red. So the bottom line is that R has a pre-determined way of reading a code, so we users should well-understand what each line is expected to do and should not expect R to just fill it in accordingly to what we write.

Now let us try a few other examples;

3. Changing size

ggplot(data = Hawks, aes(x = Species, y = Hallux, size = Culmen)) + 
  geom_point() + theme_bw()

4. Changing colour, shape and size manually

ggplot(data = Hawks, aes(x = Weight, y = Hallux, colour = Species,
                         shape = Species, size = Species)) + 
  geom_point() +
  scale_shape_manual(values=c(1, 2, 3)) +
  scale_color_manual(values=c('red','blue', 'green')) +
  scale_size_manual(values=c(1,5,10)) + theme_bw()

5. Changing the opcaity of the elements

ggplot(data = Hawks, aes(x = Weight, y = Wing, colour = Species)) +
  geom_point(alpha = 1/5) + theme_bw()

The same commands also work for most of the other types of geom(). Now let us see a few other aesthetics in other types of geoms.

6. Changing fill colour

ggplot(data = Hawks, aes(x = Weight, fill = Species)) +
  geom_histogram(bins = 25) + theme_bw()

ggplot(data = Hawks, aes(x = Weight, fill = Species)) +
  geom_histogram(bins = 25) + theme_bw() + 
  scale_fill_manual(values = c("darkred", "darkblue", "darkgreen"))

7. Changing line type

ggplot(data = Hawks, aes(x = Weight, y = Wing, linetype = Species)) + 
  geom_line() + theme_bw()

ggplot(data = Hawks, aes(x = Weight, y = Wing, linetype = Species)) + 
  geom_line() + 
  scale_linetype_manual(values= c("twodash", "longdash", "dotdash")) +
  theme_bw()

Now let us also see how to change the labels in a graph.

8. Changing labels in the axes

ggplot(data = Hawks, aes(x = Weight, y = Wing, colour = Species)) +
  geom_point() + theme_bw() + xlab("Weight (gm)") + ylab("Wing (mm)") +
  ggtitle(label = "Weight vs Wing span in three different species of Hawks", 
          subtitle = "CH=Cooper's, RT=Red-tailed, SS=Sharp-Shinned")

ggplot(data = Hawks, aes(x = Weight, y = Wing, colour = Species)) +
  geom_point() + theme_bw() + labs(x = "Weight (gm)", y = "Wing (mm)", 
  title= "Weight vs Wing span in three different species of Hawks", 
  subtitle = "CH=Cooper's, RT=Red-tailed, SS=Sharp-Shinned",
  caption = "Source: Hawk dataset from Stat2Data r-package", #caption for the graph
  colour = "Hawk Species", # rename legend title
  tag = "A") #figure tag

9. Tweaking the axes

ggplot(data = Hawks, aes(x = Weight, y = Wing, colour = Species)) +
  geom_point() + theme_bw() + xlim(c(0,1000)) + ylim(c(200,350))

ggplot(data = Hawks, aes(x = Weight, y = Wing, colour = Species)) +
  geom_point() + theme_bw() + coord_cartesian(xlim = c(0,1000),
                                                   ylim = c(200,350))

ggplot(data = Hawks, aes(x = Weight, y = Wing, colour = Species)) +
  geom_point() + theme_bw() + coord_flip()

Show code
ggplot(data = Hawks, aes(x = Weight, y = Wing, colour = Species)) +
  geom_point() + theme_bw() + scale_x_continuous(breaks = c(0,1000,2000))

Show code
ggplot(data = Hawks, aes(x = Weight, y = Wing, colour = Species)) +
  geom_point() + theme_bw() + 
  scale_x_continuous(breaks = c(0,1000,2000),label = c("low", "medium", "high"))

ggplot(data = Hawks, aes(x = Weight, y = Wing, colour = Species)) +
  geom_point() + theme_bw() + scale_y_reverse()

Summary

In this tutorial, we learned how to modify aesthetic present for different geoms in ggplot2. Then we learned how to modify labels in a graph and finally, we learned how to modify and change the axes elements. This tutorial is in no way exhaustive of the different ways you can modify a graph as there many more methods which are not discussed here. Instead of trying to include everything, this tutorial tries to be a stepping stone to help students of R to learn the basics of tweaking a graph. Try to practice what is covered here using other datasets available in the r-package Stat2Data. Have a good day my friend!


Next chapter:
3. Even more customizations in ggplot2
Previous chapter:
1. Data visualization using ggplot2

References

  1. H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016. Read more about ggplot2 here. You can also look at the cheat sheet for all the syntax used in ggplot2. Also check this out.

  2. Ann Cannon, George Cobb, Bradley Hartlaub, Julie Legler, Robin Lock, Thomas Moore, Allan Rossman and Jeffrey Witmer (2019). Stat2Data: Datasets for Stat2. R package version 2.0.0. https://CRAN.R-project.org/package=Stat2Data

Last updated on

[1] "2022-01-06 17:11:54 IST"

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.