Basic plotting in ggplot
ggplot is a package that has truly upped the level of producing quality graphics using R. The “g g” in ggplot refers to the grammar of graphics. There has been a lot of development of the theory in what makes a good plot and I encourage you to read more on the subject.
From the ggplot2 website
ggplot2 is a plotting system for R, based on the grammar of graphics, which tries to take the good parts of base and lattice graphics and none of the bad parts. It takes care of many of the fiddly details that make plotting a hassle (like drawing legends) as well as providing a powerful model of graphics that makes it easy to produce complex multi-layered graphics.
The components of a plot are:
- data and aesthetic mappings,
- geometric objects,
- scales,
- facet specification,
- statistical transformations, and
- the coordinate system.
Plots using ggplot()
are made in a series of layers. Each layer is composed of:
- data and aesthetic mappings,
- a statistical transformation (stat),
- a geometric object (geom), and
- a position adjustment
There are a TON of options for plots in ggplot
and I can not cover them all here. Everything from plotting shapefiles to violin plots. I will provide you the basics, but most are going to require you to look at the website and test out the types of plots you interested in. I strongly encourage you to explore and test out the different types of plots.
To begin and explore ggplot, we will use the diamonds
data set.
A couple of points to consider and keep in mind:
- Data needs to be in data.frame.
- Layers are separated by
+
- Plots can be saved as objects
There are several ways we can specify data in ggplot. By specifying it in the top of the hierarchy (i.e., in ggplot()
), then all the subsequent layers will use this data set. My personal feeling is to specify it in the layers so that it is clear which data you are using. I feel the same way about the aesthetics as well, but sometimes this it is required to put them in the top (i.e., position_dodge()
and error bars)
We have lots of options of the aesthetics when we are building the plots. The required aesthetics will depend on the geometry chosen. There are numerous geometries available.
Common aesthetics:
x
: the x coordinates of the data that you wish to plot. Can be numeric or categorical.y
: the y-coordinates of the data that you wish to plot.color
orcolour
: the color of points, lines, or edges. Colors can be specified using any of the R colorsfill
: similar to color but this specifies the fill of of polygons, bars, or other shapes.size
: the size of the points or the thickness of the lineshape
: used in geom_point to specify the different pointslinetype
: the type of line to be plotted (e.g.,solid
,dashed
,dotted
)alpha
: the transparency level of the layer
When we put these on the outside the aesthetic statement aes()
, all points are treated the same.
When we put these on the inside the aesthetic statement aes()
, points are treated differently based on the level of the variable. These are then given a value in a legend. Numeric values are given a continuous scale and characters or factors are given a discrete scale.
Bar plots
Start by making some data
We can control how these values are presented by using the scale commands
Controlling axes
Notice the difference between these plots. scale_y_continuous
drops out the bars greater than thelimit set, whereas coord_cartesian
keeps the bars but displays limits. Keep that in mind when using these. I tend to always use coord_cartesian
and only use scale_y_continuous
to set my breaks.
One of the things, that I really dislike about the default ggplot
is the pretty spaces that are put into the plots. You can get rid of these using expand = FALSE
in coord_cartesian
.