Ecological Detective - Probability and probability models. Part 2
Posted on March 17, 2017
Sources of the notes for this lecture are a combination of Aho(2013) (Chapters 2 and 3) and Ecological Detective (Chapters 3 and 4).
Common distributions
Discrete
Negative binomial distribution
Negative binomial gives the probability that x independent Bernoulli failures will occur prior to obtaining the rth success
- two parameters: r is the number of successes and \( \pi \) is the probability of an individual Bernoulli success
First form:
where: p is the probability of successes, s number of trials for success, and u is the number of not succeeding
and possible values for u > 0 and 0 < p > 1.
Second form:
Assumes that the rate parameter has its own probability distribution.
where n can be any value and not just n > 0. n is often called the “over dispersion” parameter
The mean of the negative binomial is
The variance of the negative binomial is
Suppose, species A has a 0.10 probability of occurring in any given plot. What is the probability of systematically exploring 0 to 150 plots to find 5 organisms, if we know the organism distribution follows a negative binomial distribution?
Plot it
What is the probability of finding 5 of species A in 35 plots?
Alternative specification uses the mean mu rather than the rate p.
From the help for dbinom():
An alternative parametrization (often used in ecology) is by the mean mu, and size, the dispersion parameter, where prob = size/(size+mu). The variance is mu + mu^2/size in this parametrization. <
Suppose, species A that it has been shown that it takes approximately 25 areas to search before you have 5 organisms. What is the probability of systematically exploring 0 to 150 plots to find 5 organisms, if we know the organism distribution follows a negative binomial distribution with a mean of 25?
Continuous
Normal or Gaussian
Most commonly used continuous pdf in statistics is the normal distribution or Gaussian distribution
It is used to represent processes where the most likely outcome is the average and it is symmetric around the average
Two parameters are \( \mu \) mean and \( \sigma \) standard deviation
expected outcomes for \( x \in {\rm I\!R} \) and \( \sigma > 0 \)
Standard normal or Z-distribution
\( \mu = 0\) and \( \sigma = 1 \)
Suppose the mean tarsus length of an adult pheasant is 72.5 mm and the standard deviation is 2.36. Assuming this follows a normal distribution, construct a pdf from 60 mm to 85 mm.
and plot it out.
What is the probability that the tarsus length is at least 68 mm?
What is the probability that the tarsus length is between 68 mm and 70 mm?
Lognormal distribution
If a random variable X has a log normal distribution, and Y = log(X), then Y will have a normal distribution.
Correspondingly, if Y is normally distributed then \( e^Y \) will be lognormally distributed.
Two parameters are \( \mu \) location parameter and \( \sigma \) scale parametre
expected outcomes for \( \mu \in {\rm I\!R} \) and \( \sigma > 0 \)
Many variables in biology have lognormal distributions (cannot be less than zero, are right-skewed) and are normally distributed after log-transformation
Chi-square distribution
The \( \chi^2 \) is defined by the degrees of freedom (the number of independent pieces of information that exist concerning an estimable parameter)
The \( \chi^2 \) distribution is frequently used for testing the null hypothesis that observed and expected frequencies are equal
The \( \chi^2 \) distribution results from the summing of independent, squared, standard normal distributions
Has the following form:
where outcomes are continuous and independent, \(x \geq 0 \), \(v \geq 0 \), and \( \Gamma(.) \) is the gamma function.
\( \Gamma(x)= (x-1)!\)
Let’s explore the relationship between the standard normal and the \( \chi^2 \) distribution by generating 10000 standard random normal values. We will then \( x^2 \) thos values and compare the distributions.
The degrees of freedom is equal to the \( \mu \), variance is equal to 2*df, and mode is df - 2
Lets look at the \( \chi^2 \) distribution with varying degrees of freedom.
Gamma distribution
The gamma distribution is named after the gamma functioned described above.
\( \theta\) is the scale parameter and \( \kappa\) is the shape parameter
Outcomes are continuous and independent, x > 0 and \( \theta >0\), \( \kappa >0\)
The mean is \( \theta * \kappa \)
The gamma distribution is most frequently used for representing phenomena with highly right-skewed probability distributions
However it is extremely flexible and can be used to mimic other pdfs
Monte Carlo Methods
In order to confront models with data, we need to estimate parameters and choose one description over another
In most cases, we do not know the underlying mechanisms and processes
One way to test our confidence in our models is to test models on sets of simulated data that we construct with known mechanisms and processes (Monte Carlo or stochastic simulation)
Monte Carlo methods uses random-number generators to construct data
One type of random number generator is the random uniform (runif() in R) that generates continuous values between a minimum (usually 0) and maximum (usually 1) where probability of each value being drawn is equal
Other distributions inclue: binomial (rbinom() in R), Poisson (rpois() in R), negative binomial (rnbinom() in R), normal (rnorm() in R), gamma (rgamma() in R), and many others
Ecological scenarios: Simple population model with process and observation uncertainty
** What are the differences between process and observation uncertainty? **
We can use Monte Carlo simulations to explore process and observation uncertainty in a simple population model
Lets look a plot of population at time t and population at t + 1 (Figure 3.6)
We can see in this figure that the process uncertainty (remember we did not look at observation uncertainty) influences the values by scattering the points along the mean (regression line). However, there is still a strong relationship between the two values ((\(R^2\) = 0.6242727)). We can see that the birth rate (15.5538219) is close to our birth rate (20) and the survival rate (0.8175352) is close to our survival rate (0.8).
Now let’s look at the influence of observation uncertainty with a plot of observed population at time t and observed population at t + 1 (Figure 3.7)
We can see in this figure that the process uncertainty and observation uncertainty we start getting a much weaker relationship (\(R^2\) = 0.3138051) by scattering the points even more along the mean. We also see that the birth rate (35.7563572) is off from our specified birth rate (20), as is the survival rate (0.5862988) compared to our specified survival rate (0.8).