library(psych)
mat<-matrix(c(3,2,1,2), byrow = TRUE, ncol = 2)
eigen(mat)
eigen() decomposition
$values
[1] 4 1
$vectors
[,1] [,2]
[1,] 0.8944272 -0.7071068
[2,] 0.4472136 0.7071068
For any number set of points, we can digest these points into eigenvectors and eigenvalue. Eigenvectors and eigenvalues exist in pairs with eigenvector describes a direction along which a linear transformation acts simply by “stretching/compressing” and/or “flipping”; and eigenvalue describing the degree of that transformation in that direction. The numbers of eigenvectors is equal to the number of dimensions of your data.
mat<-matrix(c(3,2,1,2), byrow = TRUE, ncol = 2)
eigen(mat)
eigen() decomposition
$values
[1] 4 1
$vectors
[,1] [,2]
[1,] 0.8944272 -0.7071068
[2,] 0.4472136 0.7071068
The source to many of the notes in this lesson (and a lot more detail on the subject) can be found at Finch and French (2015) and Beaujean (2014).
Latent variables in statistics are variables that are not directly observable and are inferred from a mathematical model. One advantage of using latent variables is that it helps reduce the dimensionality of data (a major theme of multivariate statistics) and has been used in many scientific disciplines.
One type of latent variable analysis is factor analysis and used extensively in social and behavioral sciences. Factor analysis allows the researcher to create models of non-observable factors (e.g., motivations, constraints, identity) from multivariate data.
There are two broad types of factor analysis: 1) Exploratory factor analysis (EFA) and 2) Confirmatory factor analysis (CFA). The difference between the two is in the degree of ** a priori ** structure that is assummed in the model. In using EFA the researcher does not impose a specific latent structure on the observable data, but allows the analysis to provide the optimal number of factors. In contrast to EFA, with CFA the researcher explicitly links the indicators with the factors to which they theoretically belong.
EFA consists of two steps (1) factor extraction and (2) factor rotation. Factor rotation involves estimating the intial model paramters (i.e., factor loadings: loadings reflect the relationships between the factors and the indicators, with larger values being indicative of a closer association between a latent and observed variable). There are as many factors as number of indicator variables (i.e., columns used to define the latent variable).
Several extraction methods exist, with the most popular being maximum likelihood (ML) and principal axis factor (PAF). ML method has a direct assessment of model fit but also relies on multivariate normality. PAF does not have a distributional assumption but does not have a test of statistical fit.
Factor rotation is the transformation of the initial set of factor loadings to simplify the interpretation of of the results by finding a simple solution. Methods fall into two broad categories: orthogonal and oblique. Orthogonal rotations constrain the correlations among factors to be zero, whereas oblique rotations allow the factors to be correlated. The most popular orthogonal rotation method is VARIMAX, while among the oblique rotations PROMAX and OBLIMIN are popular. Decision on which method to use, should be based in theory and empirical grounds.
For this data, we will use the data set provided by Finch and French (2015) here. The data represents information collected on acheivement goal orientation. Achievement goal orientation refers to how an individual interprets and reacts to tasks, resulting in different patterns of cognition, affect and behavior. There are 12 questions with results representing a 7-point likert-type scale from 430 college students.
The columns refer to:
The types of questions refer to 4 distinct latent traits: mastery approach (MAP), mastery avoidant (MAV), performance approach (PAP), and performance avoidant (PAV).
The data is in a SPSS format and I have converted it to a csv file for convience and is in our github data repository as goal_scale.csv
library(readr)
goal_scale <- read_csv("https://raw.githubusercontent.com/chrischizinski/SNR_R_Group/master/data/goal_scale.csv")
Parsed with column specification:
cols(
ags1 = col_integer(),
ags2 = col_integer(),
ags3 = col_integer(),
ags4 = col_integer(),
ags5 = col_integer(),
ags6 = col_integer(),
ags7 = col_integer(),
ags8 = col_integer(),
ags9 = col_integer(),
ags10 = col_integer(),
ags11 = col_integer(),
ags12 = col_integer()
)
head(goal_scale)
factanal
agoal.efa<-factanal(~ags1+ags2+ags3+ags4+ags5+ags6+ags7+ags8+ags9+ags10+ags11+ ags12, factors=4, rotation="promax", data = goal_scale )
agoal.efa
Call:
factanal(x = ~ags1 + ags2 + ags3 + ags4 + ags5 + ags6 + ags7 + ags8 + ags9 + ags10 + ags11 + ags12, factors = 4, data = goal_scale, rotation = "promax")
Uniquenesses:
ags1 ags2 ags3 ags4 ags5 ags6 ags7 ags8 ags9 ags10 ags11 ags12
0.487 0.335 0.279 0.342 0.557 0.388 0.104 0.005 0.231 0.201 0.300 0.306
Loadings:
Factor1 Factor2 Factor3 Factor4
ags1 0.667
ags2 0.844
ags3 0.864
ags4 0.793 0.104 -0.116
ags5 0.565 0.123
ags6 0.764
ags7 1.023 -0.120
ags8 0.756 0.583
ags9 0.884
ags10 0.866 0.143
ags11 0.799 0.195
ags12 0.833
Factor1 Factor2 Factor3 Factor4
SS loadings 4.122 2.404 1.461 0.426
Proportion Var 0.344 0.200 0.122 0.036
Cumulative Var 0.344 0.544 0.666 0.701
Factor Correlations:
Factor1 Factor2 Factor3 Factor4
Factor1 1.0000 0.0919 -0.08477 0.20174
Factor2 0.0919 1.0000 0.18936 0.66077
Factor3 -0.0848 0.1894 1.00000 -0.00277
Factor4 0.2017 0.6608 -0.00277 1.00000
Test of the hypothesis that 4 factors are sufficient.
The chi square statistic is 77.4 on 24 degrees of freedom.
The p-value is 0.000000157
Uniqueness reflects the proportion of variance in the indicators that are not explained by the factors. For example, 48.7% of variation in ags1
is not explained by the four factors.
agoal.efa$uniquenesses
ags1 ags2 ags3 ags4 ags5 ags6 ags7 ags8 ags9 ags10 ags11 ags12
0.4865429 0.3350171 0.2785319 0.3416042 0.5574857 0.3876667 0.1043941 0.0050000 0.2306985 0.2007897 0.2998883 0.3061390
The opposite of uniqueness of communality, and this is the proportion of variances explained by the factors for each indicator.
1-agoal.efa$uniquenesses
ags1 ags2 ags3 ags4 ags5 ags6 ags7 ags8 ags9 ags10 ags11 ags12
0.5134571 0.6649829 0.7214681 0.6583958 0.4425143 0.6123333 0.8956059 0.9950000 0.7693015 0.7992103 0.7001117 0.6938610
ld<-loadings(agoal.efa)
ld
Loadings:
Factor1 Factor2 Factor3 Factor4
ags1 0.667
ags2 0.844
ags3 0.864
ags4 0.793 0.104 -0.116
ags5 0.565 0.123
ags6 0.764
ags7 1.023 -0.120
ags8 0.756 0.583
ags9 0.884
ags10 0.866 0.143
ags11 0.799 0.195
ags12 0.833
Factor1 Factor2 Factor3 Factor4
SS loadings 4.122 2.404 1.461 0.426
Proportion Var 0.344 0.200 0.122 0.036
Cumulative Var 0.344 0.544 0.666 0.701
To help interpret our loadings, lets create a visualization of those loadings.
loadings<-as.data.frame(ld[,])
lt<- data.frame(indicator = paste("ags",1:12, sep =""),
latent_traits = c("MAP", "MAV", "PAP", "PAV", "MAP","MAV", "MAP", "PAV", "PAP", "PAV", "PAP", "MAV"))
loadings %>%
rownames_to_column("indicator") %>%
left_join(lt) %>%
mutate(indicator = factor(indicator, levels = paste("ags",12:1, sep =""))) %>%
gather(factor, value, -indicator, - latent_traits) %>%
mutate(value2 = ifelse(value < 0.1, NA, value))-> loadings.long
Joining, by = "indicator"
ggplot(data = loadings.long) +
geom_point(aes(x = factor, y = indicator, color = value2, shape = latent_traits), size = 8) +
scale_colour_gradient(na.value = "white", low = "blue", high = "red") +
scale_shape_manual(values = c("MAP" = 15, "MAV" = 16, "PAP" = 17, "PAV" = 18)) +
labs(color = "Loading", shape = "Latent\ntrait") +
theme_classic()
Beaujean, A. A. 2014. Latent variable modeling using r: A step-by-step guide. Routledge.
Finch, W. H., and B. F. French. 2015. Latent variable modeling with r. Routledge.
Horn, J. L. 1965. A rationale and test for the number of factors in factor analysis. Psychometrika 30:179–185. Springer.