library(psych)mat<-matrix(c(3,2,1,2), byrow = TRUE, ncol = 2)
eigen(mat)eigen() decomposition
$values
[1] 4 1
$vectors
          [,1]       [,2]
[1,] 0.8944272 -0.7071068
[2,] 0.4472136  0.7071068For any number set of points, we can digest these points into eigenvectors and eigenvalue. Eigenvectors and eigenvalues exist in pairs with eigenvector describes a direction along which a linear transformation acts simply by “stretching/compressing” and/or “flipping”; and eigenvalue describing the degree of that transformation in that direction. The numbers of eigenvectors is equal to the number of dimensions of your data.
mat<-matrix(c(3,2,1,2), byrow = TRUE, ncol = 2)
eigen(mat)eigen() decomposition
$values
[1] 4 1
$vectors
          [,1]       [,2]
[1,] 0.8944272 -0.7071068
[2,] 0.4472136  0.7071068The source to many of the notes in this lesson (and a lot more detail on the subject) can be found at Finch and French (2015) and Beaujean (2014).
Latent variables in statistics are variables that are not directly observable and are inferred from a mathematical model. One advantage of using latent variables is that it helps reduce the dimensionality of data (a major theme of multivariate statistics) and has been used in many scientific disciplines.
One type of latent variable analysis is factor analysis and used extensively in social and behavioral sciences. Factor analysis allows the researcher to create models of non-observable factors (e.g., motivations, constraints, identity) from multivariate data.
There are two broad types of factor analysis: 1) Exploratory factor analysis (EFA) and 2) Confirmatory factor analysis (CFA). The difference between the two is in the degree of ** a priori ** structure that is assummed in the model. In using EFA the researcher does not impose a specific latent structure on the observable data, but allows the analysis to provide the optimal number of factors. In contrast to EFA, with CFA the researcher explicitly links the indicators with the factors to which they theoretically belong.
EFA consists of two steps (1) factor extraction and (2) factor rotation. Factor rotation involves estimating the intial model paramters (i.e., factor loadings: loadings reflect the relationships between the factors and the indicators, with larger values being indicative of a closer association between a latent and observed variable). There are as many factors as number of indicator variables (i.e., columns used to define the latent variable).
Several extraction methods exist, with the most popular being maximum likelihood (ML) and principal axis factor (PAF). ML method has a direct assessment of model fit but also relies on multivariate normality. PAF does not have a distributional assumption but does not have a test of statistical fit.
Factor rotation is the transformation of the initial set of factor loadings to simplify the interpretation of of the results by finding a simple solution. Methods fall into two broad categories: orthogonal and oblique. Orthogonal rotations constrain the correlations among factors to be zero, whereas oblique rotations allow the factors to be correlated. The most popular orthogonal rotation method is VARIMAX, while among the oblique rotations PROMAX and OBLIMIN are popular. Decision on which method to use, should be based in theory and empirical grounds.
For this data, we will use the data set provided by Finch and French (2015) here. The data represents information collected on acheivement goal orientation. Achievement goal orientation refers to how an individual interprets and reacts to tasks, resulting in different patterns of cognition, affect and behavior. There are 12 questions with results representing a 7-point likert-type scale from 430 college students.
The columns refer to:
The types of questions refer to 4 distinct latent traits: mastery approach (MAP), mastery avoidant (MAV), performance approach (PAP), and performance avoidant (PAV).
The data is in a SPSS format and I have converted it to a csv file for convience and is in our github data repository as goal_scale.csv
library(readr)
goal_scale <- read_csv("https://raw.githubusercontent.com/chrischizinski/SNR_R_Group/master/data/goal_scale.csv")Parsed with column specification:
cols(
  ags1 = col_integer(),
  ags2 = col_integer(),
  ags3 = col_integer(),
  ags4 = col_integer(),
  ags5 = col_integer(),
  ags6 = col_integer(),
  ags7 = col_integer(),
  ags8 = col_integer(),
  ags9 = col_integer(),
  ags10 = col_integer(),
  ags11 = col_integer(),
  ags12 = col_integer()
)head(goal_scale)factanalagoal.efa<-factanal(~ags1+ags2+ags3+ags4+ags5+ags6+ags7+ags8+ags9+ags10+ags11+ ags12, factors=4, rotation="promax", data = goal_scale )
agoal.efa
Call:
factanal(x = ~ags1 + ags2 + ags3 + ags4 + ags5 + ags6 + ags7 +     ags8 + ags9 + ags10 + ags11 + ags12, factors = 4, data = goal_scale,     rotation = "promax")
Uniquenesses:
 ags1  ags2  ags3  ags4  ags5  ags6  ags7  ags8  ags9 ags10 ags11 ags12 
0.487 0.335 0.279 0.342 0.557 0.388 0.104 0.005 0.231 0.201 0.300 0.306 
Loadings:
      Factor1 Factor2 Factor3 Factor4
ags1           0.667                 
ags2                   0.844         
ags3   0.864                         
ags4   0.793           0.104  -0.116 
ags5           0.565   0.123         
ags6           0.764                 
ags7           1.023  -0.120         
ags8   0.756                   0.583 
ags9   0.884                         
ags10  0.866                   0.143 
ags11  0.799                   0.195 
ags12                  0.833         
               Factor1 Factor2 Factor3 Factor4
SS loadings      4.122   2.404   1.461   0.426
Proportion Var   0.344   0.200   0.122   0.036
Cumulative Var   0.344   0.544   0.666   0.701
Factor Correlations:
        Factor1 Factor2  Factor3  Factor4
Factor1  1.0000  0.0919 -0.08477  0.20174
Factor2  0.0919  1.0000  0.18936  0.66077
Factor3 -0.0848  0.1894  1.00000 -0.00277
Factor4  0.2017  0.6608 -0.00277  1.00000
Test of the hypothesis that 4 factors are sufficient.
The chi square statistic is 77.4 on 24 degrees of freedom.
The p-value is 0.000000157 Uniqueness reflects the proportion of variance in the indicators that are not explained by the factors. For example, 48.7% of variation in ags1 is not explained by the four factors.
agoal.efa$uniquenesses     ags1      ags2      ags3      ags4      ags5      ags6      ags7      ags8      ags9     ags10     ags11     ags12 
0.4865429 0.3350171 0.2785319 0.3416042 0.5574857 0.3876667 0.1043941 0.0050000 0.2306985 0.2007897 0.2998883 0.3061390 The opposite of uniqueness of communality, and this is the proportion of variances explained by the factors for each indicator.
1-agoal.efa$uniquenesses     ags1      ags2      ags3      ags4      ags5      ags6      ags7      ags8      ags9     ags10     ags11     ags12 
0.5134571 0.6649829 0.7214681 0.6583958 0.4425143 0.6123333 0.8956059 0.9950000 0.7693015 0.7992103 0.7001117 0.6938610 ld<-loadings(agoal.efa)
ld
Loadings:
      Factor1 Factor2 Factor3 Factor4
ags1           0.667                 
ags2                   0.844         
ags3   0.864                         
ags4   0.793           0.104  -0.116 
ags5           0.565   0.123         
ags6           0.764                 
ags7           1.023  -0.120         
ags8   0.756                   0.583 
ags9   0.884                         
ags10  0.866                   0.143 
ags11  0.799                   0.195 
ags12                  0.833         
               Factor1 Factor2 Factor3 Factor4
SS loadings      4.122   2.404   1.461   0.426
Proportion Var   0.344   0.200   0.122   0.036
Cumulative Var   0.344   0.544   0.666   0.701To help interpret our loadings, lets create a visualization of those loadings.
loadings<-as.data.frame(ld[,])
lt<- data.frame(indicator = paste("ags",1:12, sep =""),
           latent_traits = c("MAP", "MAV", "PAP", "PAV", "MAP","MAV", "MAP", "PAV", "PAP", "PAV", "PAP", "MAV"))
loadings %>% 
  rownames_to_column("indicator") %>% 
  left_join(lt) %>% 
  mutate(indicator = factor(indicator, levels = paste("ags",12:1, sep =""))) %>% 
  gather(factor, value, -indicator, - latent_traits) %>% 
  mutate(value2 = ifelse(value < 0.1, NA,  value))-> loadings.long Joining, by = "indicator"ggplot(data = loadings.long) +
  geom_point(aes(x = factor, y = indicator, color = value2, shape = latent_traits), size = 8) +
  scale_colour_gradient(na.value = "white", low = "blue", high = "red") +
  scale_shape_manual(values = c("MAP" = 15, "MAV" = 16, "PAP" = 17, "PAV" = 18)) +
  labs(color = "Loading", shape = "Latent\ntrait") +
  theme_classic()Beaujean, A. A. 2014. Latent variable modeling using r: A step-by-step guide. Routledge.
Finch, W. H., and B. F. French. 2015. Latent variable modeling with r. Routledge.
Horn, J. L. 1965. A rationale and test for the number of factors in factor analysis. Psychometrika 30:179–185. Springer.