Plotting smooth surfaces on NMDS plots with ggplot

The RMarkdown source to this file can be found here

A little while back I showed how to produce NMDS plots using the vegan and ggplot2 packages. In this post, I will extend the production of the NMDS plots to reproducing the smooth surface plots produced by the function ordisurf in the vegan package.

From the documentation, ordisurf, which requires package mgcv) fits smooth surfaces for continuous variables onto ordination using thinplate splines with cross-validatory selection of smoothness.

I have found these type of plots particularly insightful in both understanding relationships in bird communities to changes in vegetation structure and to fish communities relationship with angling effort

Similar to reasons I mentioned when making NMDs plots in my previous posts, these plots look a whole lot better and easier to make publication ready using ggplots themes

ordisurf plots using base plots

Load the required libraries. For this demonstration I will use the dune dataset within the vegan package. The analysis I am showing here is based on information presented in the Vegan: an introduction to ordination vignette by Jari Oksanen

library(ggplot2)
library(vegan)
library(grid)
set.seed(123456)

data(dune)
head(dune)
##    Belper Empnig Junbuf Junart Airpra Elepal Rumace Viclat Brarut Ranfla
## 2       3      0      0      0      0      0      0      0      0      0
## 13      0      0      3      0      0      0      0      0      0      2
## 4       2      0      0      0      0      0      0      0      2      0
## 16      0      0      0      3      0      8      0      0      4      2
## 6       0      0      0      0      0      0      6      0      6      0
## 1       0      0      0      0      0      0      0      0      0      0
##    Cirarv Hyprad Leoaut Potpal Poapra Calcus Tripra Trirep Antodo Salrep
## 2       0      0      5      0      4      0      0      5      0      0
## 13      0      0      2      0      2      0      0      2      0      0
## 4       2      0      2      0      4      0      0      1      0      0
## 16      0      0      0      0      0      3      0      0      0      0
## 6       0      0      3      0      3      0      5      5      3      0
## 1       0      0      0      0      4      0      0      0      0      0
##    Achmil Poatri Chealb Elyrep Sagpro Plalan Agrsto Lolper Alogen Brohor
## 2       3      7      0      4      0      0      0      5      2      4
## 13      0      9      1      0      2      0      5      0      5      0
## 4       0      5      0      4      5      0      8      5      2      3
## 16      0      2      0      0      0      0      7      0      4      0
## 6       2      4      0      0      0      5      0      6      0      0
## 1       1      2      0      4      0      0      0      7      0      0

Run the ordination on the dune dataset and plot the result with base graphics

ord <- metaMDS(dune)
## Run 0 stress 0.1193 
## Run 1 stress 0.1183 
## ... New best solution
## ... procrustes: rmse 0.02027  max resid 0.06494 
## Run 2 stress 0.1183 
## ... procrustes: rmse 2.269e-05  max resid 7.575e-05 
## *** Solution reached
ord
## 
## Call:
## metaMDS(comm = dune) 
## 
## global Multidimensional Scaling using monoMDS
## 
## Data:     dune 
## Distance: bray 
## 
## Dimensions: 2 
## Stress:     0.1183 
## Stress type 1, weak ties
## Two convergent solutions found after 2 tries
## Scaling: centring, PC rotation, halfchange scaling 
## Species: expanded scores based on 'dune'
plot(ord, type = "n")
points(ord, display = "sites", cex = 1, pch = 16, col = "red")
text(ord, display = "species", cex = 1, col = "blue")

center

Now fit continuous environmental factors from the dune.env dataset using ordisurf.

data(dune.env)
head(dune.env)
##     A1 Moisture Management      Use Manure
## 2  3.5        1         BF Haypastu      2
## 13 6.0        5         SF Haypastu      3
## 4  4.2        2         SF Haypastu      4
## 16 5.7        5         SF  Pasture      3
## 6  4.3        1         HF Haypastu      2
## 1  2.8        1         SF Haypastu      4
plot(ord, type = "n")
points(ord, display = "sites", cex = 1, pch = 16, col = "red")
text(ord, display = "species", cex = 1, col = "blue")
ordisurf(ord, dune.env$A1, add = TRUE)

center

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## y ~ s(x1, x2, k = 10, bs = "tp", fx = FALSE)
## <environment: 0x0000000015c7dac0>
## 
## Estimated degrees of freedom:
## 1.59  total = 2.59 
## 
## REML score: 41.59

ordisurf plots using ggplot

The first step is to extract the data from the ord object and put it into a dataframe. For the purpose of this demonstration, I am going to just focus on the species scores.

Notice that I add a ‘z’ column filled with NAs, which allows the score data sets to be combined with the contour dataset.

species.scores <- as.data.frame(scores(ord, "species"))
species.scores$species <- rownames(species.scores)
names(species.scores)[c(1, 2)] <- c("x", "y")
species.scores$z <- NA

head(species.scores)
##               x        y species  z
## Belper -0.47834 -0.24453  Belper NA
## Empnig -0.08858  1.69637  Empnig NA
## Junbuf  0.26483 -0.60750  Junbuf NA
## Junart  0.91148 -0.08299  Junart NA
## Airpra -0.52855  1.67980  Airpra NA
## Elepal  1.24505  0.16158  Elepal NA

Next step is to run the ordisurf function as above.

dune.sf <- ordisurf(ord ~ dune.env$A1, plot = FALSE, scaling = 3)
head(dune.sf)
## $coefficients
## (Intercept)  s(x1,x2).1  s(x1,x2).2  s(x1,x2).3  s(x1,x2).4  s(x1,x2).5 
##   4.850e+00  -4.962e-06   1.084e-07   4.976e-06   1.178e-05  -1.745e-07 
##  s(x1,x2).6  s(x1,x2).7  s(x1,x2).8  s(x1,x2).9 
##   2.924e-06  -2.379e-05   9.978e-01   2.136e-01 
## 
## $residuals
##       1       2       3       4       5       6       7       8       9 
## -0.3474  0.7280 -0.2302 -0.8285  0.2823 -0.3654 -1.0333  2.5075  0.1111 
##      10      11      12      13      14      15      16      17      18 
##  5.0940 -0.7213 -0.9555 -1.1335 -0.1942 -0.2218 -3.1576  2.7045 -1.4852 
##      19      20 
##  0.3919 -1.1455 
## 
## $fitted.values
##     1     2     3     4     5     6     7     8     9    10    11    12 
## 3.847 5.272 4.430 6.528 4.018 3.165 5.233 3.793 3.889 6.406 4.021 4.456 
##    13    14    15    16    17    18    19    20 
## 4.833 4.794 4.522 6.658 6.596 5.185 5.408 3.945 
## 
## $family
## 
## Family: gaussian 
## Link function: identity 
## 
## 
## $linear.predictors
##     1     2     3     4     5     6     7     8     9    10    11    12 
## 3.847 5.272 4.430 6.528 4.018 3.165 5.233 3.793 3.889 6.406 4.021 4.456 
##    13    14    15    16    17    18    19    20 
## 4.833 4.794 4.522 6.658 6.596 5.185 5.408 3.945 
## 
## $deviance
## [1] 58.68

We need to extract the contour information from dune.sf object. This is in the $grid. I wrote a function that will pull out this information and put it into a dataframe with the column headers ‘x’, ‘y’, and ‘z’.

extract.xyz <- function(obj) {
    xy <- expand.grid(x = obj$grid$x, y = obj$grid$y)
    xyz <- cbind(xy, c(obj$grid$z))
    names(xyz) <- c("x", "y", "z")
    return(xyz)
}

contour.vals <- extract.xyz(obj = dune.sf)
head(contour.vals)
##         x      y     z
## 1 -0.8603 -0.716 3.133
## 2 -0.7956 -0.716 3.238
## 3 -0.7309 -0.716 3.343
## 4 -0.6662 -0.716 3.449
## 5 -0.6015 -0.716 3.554
## 6 -0.5368 -0.716 3.659

We now have the required information to reproduce the plot from ordisurf in ggplot2. The first step is the lay down the contours in the plot.

p <- ggplot(data = contour.vals, aes(x, y, z = z)) + stat_contour(aes(colour = ..level..)) + 
    coord_cartesian(xlim = c(-2, 2), ylim = c(-1, 1.5)) + theme_mine()

print(p)
## Warning: Removed 156 rows containing non-finite values (stat_contour).

center

Then add the species scores. Note that I am using theme_mine which can be sourced from my themes.r file

p <- p + geom_text(data = species.scores, aes(x = x, y = y, label = species), 
    colour = "red") + coord_equal() + theme_mine() + labs(x = "NMDS1", y = "NMDS2") + 
    theme(panel.border = element_rect(fill = NA), axis.text.x = element_blank(), 
        axis.text.y = element_blank(), legend.position = "none")

print(p)
## Warning: Removed 156 rows containing non-finite values (stat_contour).

center

Adding contour labels

The trickiest part of reproducing the contours is getting the contour labels on the lines. I usually manually create a data.frame and manually set the points that I want the labels.

labelz <- data.frame(x = c(-0.85, -0.8, -0.45, -0.15, 0.15, 0.5, 0.85), y = c(0.05, 
    1.1, 1.1, 1.1, 1, 0.75, 0.65), z = NA, labels = c("3.5", "4.0", "4.5", "5.0", 
    "5.5", "6.0", "6.5"))

p + geom_text(data = labelz, aes(x = x, y = y, label = labels), angle = -80, 
    size = 6)
## Warning: Removed 156 rows containing non-finite values (stat_contour).

center

The other option is to take advantage of the directlabels package and utilize the function direct.labels().

library(directlabels)
direct.label(p)
## Loading required package: proto
## Warning: Removed 156 rows containing non-finite values (stat_contour).
## Warning: Removed 156 rows containing non-finite values (stat_contour).

center

As I have demonstrated that the key to using ggplot2 to produce plots is to get the relevant data in a data.frame. Once you get it into that format, generally, it is then easy as pie.

Leave a Comment