Modeling contextual effects using individual-level data and without aggregation: an illustration of multilevel factor analysis (MLFA) with collective efficacy

Population health scientists increasingly study how contextual-level attributes affect individual health. A major challenge in this domain relates to measurement, i.e., how best to measure and create variables that capture characteristics of individuals and their embedded contexts. This paper presents an illustration of multilevel factor analysis (MLFA), an analytic method that enables researchers to model contextual effects using individual-level data without using derived variables. MLFA uses the shared variance in sets of observed items among individuals within the same context to estimate a measurement model for latent constructs; it does this by decomposing the total sample variance-covariance matrix into within-group (e.g., individual-level) and between-group (e.g., contextual-level) matrices and simultaneously modeling distinct latent factor structures at each level. We illustrate the MLFA method using items capturing collective efficacy, which were self-reported by 2,599 adults in 65 census tracts from the Los Angeles Family and Neighborhood Survey (LAFANS). MLFA identified two latent factors at the individual level and one factor at the neighborhood level. Indicators of collective efficacy performed differently at each level. The ability of MLFA to identify different latent factor structures at each level underscores the utility of this analytic tool to model and identify attributes of contexts relevant to health. Electronic supplementary material The online version of this article (doi:10.1186/s12963-015-0045-1) contains supplementary material, which is available to authorized users.

Population health scientists are increasingly interested in studying multilevel phenomena, or how features of the social and physical contexts in which individuals live, learn, work, and play (e.g., neighborhoods, schools, or workplaces) are associated with individual health, disease, and behavior [1,2]. A major challenge faced by multilevel researchers relates to measurement and how best to measure features of contexts and create variables that capture both the characteristics of individuals and the contexts in which they are embedded. Identifying novel measures to capture the features of contexts that may be relevant to health is an area where multilevel researchers have urged for more progress [3][4][5][6][7][8].
One of the best examples of the challenges related to and limitations of existing approaches with regards to measurement of multilevel phenomena is evident in research on collective efficacy. Collective efficacy was first articulated in a paper by Sampson and colleagues as a feature of neighborhoods that consists of two dimensions: social cohesion among neighbors (social cohesion) and neighbors' willingness to intervene on behalf of the common good (informal social control) [9]. Since its introduction, collective efficacy has been one of the most heavily studied constructs in epidemiological and population-based research, particularly neighborhood studies, with more than 5,000 articles citing the paper introducing the concept. Collective efficacy has been found in numerous empirical studies to be positively associated with many health and developmental outcomes [9][10][11][12][13][14].
As shown in Table 1, several approaches have been used to create variables that capture collective efficacy or related contextual-level social phenomena, such as income inequality or social capital. The most popular approach has been to create a derived variable, which entails summarizing the characteristics of individuals within a group, using means, medians, proportions, or measures of dispersion (e.g., variances) or other aggregation approaches [15]. Means have been the most popular type of derived variable used in research on collective efficacy as well as other areas of multilevel research. To construct these group or contextual-level means, the major strategy has been to first average individual responses to items on a given scale; these means are then subsequently averaged across individuals living in the same context (e.g., neighborhood) to arrive at a contextual-level measure [10,14,[16][17][18][19].
A second approach has been to use factor analytic or latent variable models to determine whether multiple items should be grouped together in a common construct. Although factor analytic methods can be conducted at one or more levels of analysis (e.g., individual level, contextual level, or both), the majority of studies have focused on single-level factor analytic approaches [18]. Few studies have used latent variable approaches to study collective efficacy, even though the authors introducing the concept used a hierarchical linear latent variable modeling approach to study collective efficacy and estimate its relationship to violent crime [9].
While both derived variables and single-level factor analytic approaches are widely used and easy to construct, their use in multilevel research may be problematic in some cases. For example, there may be instances when more than one variable best represents the contextuallevel phenomenon. Moreover, there may also be instances when it is misleading to assume the function of the items and how they relate to each other is the same at all levels of analysis. New approaches are therefore needed that allow researchers to model contextual effects using individual-level data when existing measurement strategies (e.g., derived variables, single-level factor analyses) are not ideal.
In an effort to expand the population health scientist's toolkit, this paper provides an applied example of one analytic techniquemultilevel factor analysis (MLFA)that is a good alternative to existing approaches to create group or contextual-level measures. MLFA is not a new method, as it was first articulated more than 25 years ago [20][21][22][23]. However, the method has not yet been widely used, especially in population health and epidemiology. MLFA allows researchers to both model contextual effects using individual-level data without using derived variables and create variables that capture individual as well as group-level variability using one or more measures at each level of analysis (see for example [24][25][26][27][28]).
MLFA is part of a family of factor analytic models that seek to capture the shared variance among an observed set of variables in terms of a potentially smaller number of unobserved constructs or latent factors. Conceptually Table 1 Approaches used to construct variables to model the effects of collective efficacy or related social-environmental  variables, such as income inequality or social capital   Variable approach Description Examples

Derived variable
Derived variables are created by summarizing the characteristics of individuals within a group, using means, medians, proportions, or measures of dispersion (e.g., variances) or other aggregation approaches Based on group-level mean Use average individual responses to items on a given scale; these means are then subsequently averaged across individuals living in the same context (e.g., neighborhood) to arrive at a contextual-level measure. [10,14,16,17] Based on group-level variance Use average individual responses to items on a given scale; the variance (or standard deviation) in these means are then examined among individuals living in the same context (e.g., neighborhood) to arrive at a contextual-level measure. [19] Factor Analysis Capture the shared variance among an observed set of variables in terms of a potentially smaller number of unobserved constructs or latent factors.
Single-level factor analysis Latent factors are estimated at only one level (i.e., the individual or contextual level). [18] Multilevel factor analysis (MLFA) Latent factors are estimated at two-levels of analysis. Latent factors structures can differ at each level of analysis.
[ [24][25][26][27][28] Hierarchical Latent Variable Model A special case of the 2-level MLFA that imposes stricter parameter constraints than the most general MLFA wherein latent factors are estimated at only the individual level with the factor variances decomposed into within-and between-group components. [9,51] and analytically, MLFA is distinct from the other measurement approaches, including derived variables, singlelevel factor analyses, and hierarchical latent variable models (HLVM), which all assume the constructs of interest are the same at each level of analysis. Singlelevel exploratory (EFA) or confirmatory factor analysis (CFA) estimates latent factors at only one level (i.e., the individual or contextual level). HLVM also estimates latent factors at only one level but captures both within-and between-level variability in those factors. In contrast, MLFA allows for different latent factor structures at each level of analysis. This occurs because the MLFA decomposes the total sample variance-covariance matrix into within-group (i.e., individual-level, within a context) and betweengroup (i.e., contextual-level) matrices and simultaneously models distinct latent factor structures at each of these levels [22,29,30]. As we detail below, HLVM is a special case of MLFA. Thus, MLFA can be viewed as an analytic approach that allows the user to relax some of the potentially untenable assumptions and constraints imposed by the HLVM specification.
In this methodological demonstration, we apply MLFA to examine the underlying factor structure of items measuring collective efficacy and compare the results to the closest analytic alternative, the HLVM. Although our focus is on collective efficacy for demonstration purposes, the MLFA technique can be applied to numerous other possible contextual-level social constructs. The MLFA technique could also be extended to evaluate the measurement quality (e.g., reliability and validity) of contextual or ecological measures, including those that are directly assessed (rather than ascertained through data collected on individuals), as has been advocated by researchers concerned with "ecometrics" [6,31].
A web-based Technical Guide (see Additional file 1) is provided to guide users in implementing MLFA in MPlus. This Technical Guide is intended to guide readers on the procedures to fit and interpret results from two multilevel factor analytic models: (1) a multilevel exploratory factor analysis (ML-EFA), and (2) multilevel confirmatory factor analysis (ML-CFA).

Sample and study design
Data came from the Los Angeles Family and Neighborhood Survey (L.A. FANS), a longitudinal study examining the impact of neighborhoods on children's development and well-being [32]. The study followed a stratified random sample of 3,090 households from 65 census tracts in Los Angeles County. Within each household that contained both adults and school-aged children, a randomly selected adult (RSA) was chosen, who completed surveys at Wave I (Spring 2000-Fall 2001). For the current study, we used data on perceptions of the neighborhood collected from the RSA. Our analytic sample consisted of 2,594 RSA respondents living in 65 census tracts. Respondents were primarily female (69.1%), Latino(a) (59.5%), and non-home owners (59.4%), with a mean age of 38.8 years (sd = 13.6).

Collective efficacy
Based on previous work [9], collective efficacy was measured using 10 items that captured both perceived neighborhood informal social control and social cohesion [10].
Social cohesion was measured using seven items (refer to items 1-7 in Table 2) rated on a five-point scale (1 = strongly agree to 5 = strongly disagree). Informal social control was measured using three items (refer to items 8-10 in Table 2) rated on a five-point scale (1 = very unlikely to 5 = very likely) indicating how likely the respondent would be to intervene if they witnessed these three events.

Statistical analysis
We used multilevel factor analysis (MLFA), a method that models the responses for person i in cluster j (e.g., neighborhood) to a set of M items (or indicator variables), denoted y ij = (y 1ij , …, y Mij ), as a function of both individual-level (i.e., within-group or "Level 1") and neighborhood-level (i.e., between-group or "Level 2") factors, represented by η W and η B , respectively.
The within-group model is given by where ν j is a vector of the neighborhood j's mean responses for each of the M items for the population of individuals embedded in neighborhood j; η Wij is a vector of individual i's values for the individual-level factors, with Ε(η W ) = 0 and Var(η W ) = ψ W ; Λ W is a matrix of factor loadings describing the relationships between the individual-level factors, η W , and the indicator variables, y ij ; and ε ij is the residual for individual i in neighborhood j, with Ε(ε) = 0 and Var(ε) = θ. Typically, with continuous ys, the residuals and factors are specified to be normally distributed, with all residuals uncorrelated with each other and with the factors.
The between-group model is given by where γ is a vector of overall means for the M items; η Bj is a vector of neighborhood j's values for the grouplevel factors, with Ε(η B ) = 0 and Var(η B ) = ψ B ; Λ B is a matrix of factor loadings describing the relationships between the group-level factors, η B , and the group-level random intercept indicators, ν j ; and ζ j is the residual for neighborhood j, with Ε(ζ) = 0 and Var(ζ) = σ. Like the within-group model, the residuals and factors are specified to be normally distributed, with all residuals uncorrelated with each other and with the factors.
Substituting Equation 2 into Equation 1 yields a single combined model: showing that the observed responses at the individual level are specified as distinct effects of both individual-and group-level factors. These effects are depicted in Figure 1 by a path diagram for a hypothetical six-item MLFA with two within-group and one between-group factors. The variables (observed in squares and latent in circles) within the "Individual i" box are variables that vary across each individual embedded in neighborhood j. The variables outside the "Individual i" box and within the "Neighborhood j" box vary across each neighborhood, but are constant for all individuals within a given neighborhood. The individual-level and neighborhood-level residuals are represented by the small arrows pointing to the observed ys and the neighborhood-level random intercept, respectively.
The model described in Equations 1 and 2 can be extended to non-continuous (e.g., binary, ordinal, count, etc.) indicator variables using a generalized linear model formulation. Briefly (and as outlined in greater detail in [33,34]), any vector of indicator variables, y ij , can be expressed as the sum of the individual expected values, μ ij and the individual residuals, ε ij ; that is, The distribution of the residuals is chosen to correspond to the measurement scale of the observed indicators, e.g., a Bernoulli distribution for binary indicators. A link function, g, then relates the individual expected values to a linear combination of the latent factors; that is, The between-group model remains the same. In the case of continuous approximately normally distributed observed outcomes, the usual specification is the identity link function, resulting in straightforward linear regressions relating the observed variables to the latent factor. In the case of binary indicators, one might choose a logit link function, resulting in logistic regressions relating the observed categorical indicators to the latent factors. In the case of an observed ordinal response scale, as with our indicators of collective efficacy, we used the ordinal probit link function [35]. All models were estimated via weighted least squares using a diagonal weight matrix with standard errors and mean-and varianceadjusted chi-square test statistics that used a full weight matrix (WLSMV).
To showcase the MLFA approach, we conducted our analyses in four steps. First, we calculated intraclass correlation coefficients (ICCs) for each item. These ICCs provide information about the proportion of variance in each item that is due to differences between neighborhoods. Second, we used polychoric correlations (where each correlation is a measure of the pairwise association for two ordinal variables, which rests upon the assumption of an underlying joint continuous distribution) to examine the strength, direction, and magnitude of the associations among the items. We examined these associations in two correlation matrices: (1) the within-level (individual) matrix; and (2) the between-level (neighborhood) matrix. Third, we randomly split the sample into two equally sized subsamples and conducted a multilevel exploratory analysis (ML-EFA) with one subsample and a confirmatory analysis (ML-CFA) with the other. An EFA is ideal to use in situations when researchers lack hypotheses concerning the number of latent factors underlying an item set or what the relationships are between each factor and the items; a CFA is more appropriate when researchers have hypotheses regarding the number of factors and the factor-item relationships or are seeking to test the validity of a theoretical model [36,37]. Both techniques are shown here for illustration purposes. Finally, we fit the hierarchical latent variable model (HLVM) outlined by Sampson et al. [9] as a comparison. The HLVM is a special case of the MLFA, where the factor measurement model is the same (i.e., same number of factors, same loading patterns, and same loading values) at the within-and between-group models and there is no between-group item-specific residual. HLMV can also be seen as an extension of a single-level factor analysis, where the overall factor variance-covariance structure is comprised of within-and between-group variance-covariance components. The important distinction between the MLFA and HLVM is that the factors in the HLVM are only defined at the within-level while in the MLFA there are distinct factors defined at both the within-and between-level models. For the HLVM, the within-group is the same as for the MLFA, as given in Equation (1). The between-group model is given by Substituting Equation (6) into Equation (1) yields a single combined model for the HLVM: where γ is a vector of overall means for the M items; η Wij and η Bj capture within-group across-person variability and between-group variability, respectively, in a set of latent factors, η, with Ε(η) = 0 and Var(η) = ψ W + ψ B ; Λ W is a matrix of factor loadings describing the relationships between the factors, η, and the indicator variables, y ij ; and ε ij is the residual for individual i in neighborhood j, with Ε(ε) = 0 and Var(ε) = θ. The HLVM can be more simply written as showing that the observed indicators are a function of only individual-level factors with the variance-covariance of those factors explicitly decomposed by the model into within-group and between-group variance components. As with the MLFA, the HLVM can use a generalized linear model approach to specify the relationships between the items and the factor in the case of non-continuous item responses. The specific HLVM model used by Sampson et al. [9], expressed as a three-level model with items nested within persons nested within clusters, imposes the additional constraints of all factor loadings being fixed at one and all item residual variances constrained to be equal.
We conducted all analyses using Mplus software version 7. Mplus handles missing data under the missing at random assumption (MAR) using the WLSMV estimator, which allows missingness to be a function of the observed covariates, but not observed outcomes, as is the case for full information maximum likelihood (FIML). When there are no covariates in the model, as is the case here, this is analogous to pairwise present analysis [38,39]. Analyses also included sampling weights to adjust for non-response and the unequal probability of selection of neighborhoods and households into the sample. Across all models, we evaluated goodness-of-fit using the model chi-square test, normed comparative fit index (CFI; [40]), root mean square error of approximation (RMSEA; [41]), and the standardized root mean square residual (SRMR; [38]). These statistics provide information about how well the model-estimated population correlations reproduce the sample correlations. Acceptable model fit was determined by a non-significant chi-square test, CFI values greater than 0.95, and RMSEA and SRMR values below 0.10 [42]. The CFI, RMSEA, and SRMR values were given more emphasis than the chisquare test, as the chi-square test statistic is often significant (implying there is significant misfit of the model to the data) when the sample size is large. In the MLFA, an SRMR is provided at both the within and between level. As there are no established guidelines for interpreting the SRMR at the between level, we considered the guidelines that are typically applied for single-level analyses (≤0.10). We also examined the residuals for the between-level correlation matrix, which are an indicator of model fit.
Of note, there are alternative statistical software packages, such as MLwiN or MLwiN via Stata, that can be used to estimate MLFA models. Readers interested in fitting the MLFA using MLwiN are referred to the MLwiN website: http://www.bristol.ac.uk/cmm/ software/mlwin/. In addition, the MLFA method can also be fit using Markov chain Monte Carlo (MCMC) methods. Such Bayesian estimation procedures may provide a particularly good alternative to maximum likelihood methods in instances when maximum likelihood is too computationally intensive or when there are some instances of a small number of individuals per cluster or when there are a small number of overall clusters [21].

Intraclass correlation coefficients (ICC)
ICC estimates ranged from small to large in magnitude and were generally equivalent across our split samples ( Table 2). In the total sample, the largest estimated ICC (0.262) was for the item "children were spray-painting graffiti on a local building." The lowest ICC in the total sample (0.062) was for "children were showing disrespect to an adult." Thus, most of the variability in these items was due to differences across individuals within rather than between neighborhoods. However, there was considerable variability among the indicators as to the proportion of variation explained between neighborhoods. This suggests that neighborhood-level variation is not uniform across indicators and that for some indicators, neighborhood-level influences may be more important. Tables 3 and 4, the within level (individual) and between level (neighborhood) had different correlation structures. While the average absolute correlation value at the within level was 0.304 (range r = 0.093 to r = 0.557), the average absolute correlation value at the between level was higher (average = 0.685; range r = 0.205 to r = 0.934). Some items also had markedly differently correlations at each level. For example, the items "people here do not get along with each other" and "people would intervene if children were spray painting graffiti" had a very strong correlation at the between-level (r = 0.858), but a weak correlation at the within-level (r = 0.239). These finding suggest the item-to-item relationships differ across the two levels of analysis (within-and between-level).

Multilevel exploratory factor analysis (ML-EFA)
The final ML-EFA model, which was selected based on good model-data consistency, parsimony, and interpretability, had two within-level factors and one betweenlevel factor ( Table 5). In this factor solution, the largest factor loadings for each item at the within level (0.418 to 0.773) and between level (0.462 to 0.972) ranged from moderate to high. In addition to good overall model fit, as evidenced by the CFI of 0.947 and RMSEA of 0.059, this solution also had excellent model fit specifically at the within and between levels, as shown in the SRMR values at each level 0.039 and 0.068, respectively. In contrast, the next best fitting modelthe two factor within and two-factor between modelhad a good overall fit (SRMR within = 0.039; SRMR between = 0.045). However, the second between-level factor had only one significantly loading item (refer to page 21 of the online Technical Guide. Beyond its empirical fit, the ML-EFA solution was also aligned with prior theory. At the within level, the first factor mapped on to the construct social cohesion and the second factor mapped on to the construct informal social control, as described by others [9,10]. At the between level, the indicator variables only supported one overarching factor, which has previously been labeled as collective efficacy [9,10]. Interestingly, the sixth item (people in this neighborhood do not share the same values) did not load significantly on either factor at the within level, but had a significant factor loading at the between level. This finding illustrates that indicator variables can perform differently at each level of analysis and therefore items should only be removed from a MLFA if they are determined not to function at both levels of analysis.
The first and second within-level factors were moderately correlated (r = 0.521). The communalities, or itemspecific R 2 values, which refer to the proportion of an indicator's total variance accounted for by the factor solution, ranged at the within level from a low of 8.4% (for respondents' rating of people in the neighborhood sharing the same values) to a high of 57.1% (for respondents' rating of people's willingness to help neighbors) at the within level. At the between level, the communalities were higher across the items, ranging from a low of 21.4% (for neighborhoods' collective tendency to intervene if children show disrespect to an adult) to a high of 94.4% (for neighborhoods' collective tendency to watch out that kids are safe).

Multilevel confirmatory factor analysis (ML-CFA)
The ML-EFA results from the first subsample were cross-validated using ML-CFA for the second subsample.  As shown in Table 6, the fit of the ML-CFA model was good (CFI = 0.903; RMSEA = 0.079; SRMR within = 0.054; SRMR between = 0.073). By and large, factor loadings in the ML-CFA were similar to the ML-EFA. We also ran an alternative ML-CFA specification with the constraints imposed by the Sampson et al. version of the HLVM described earlier. The overall fit of this model was markedly worse than the ML-CFA without these restrictions (χ 2 = 1445.265; df = 86; p-value < 0.001; RMSEA = 0.110; CFI = 0.766; SRMR within = 0.095; SRMR between = 0.325), suggesting that a more restricted model lacked the model-data consistency observed with the less restrictive ML-CFA. Of note, a single-level factor analysis, which is the equivalent of adding to the HLVM a further constraint of zero between-level factor variance, would have a poorer fit than the HLVM. Although not the case here, it is possible that for another dataset, the HLVM specification could fit equivalent to the MLFA. Such a finding would suggest that the data do not support a different factor structure at the within and between-group levels, and the HLVM could be favored as a more parsimonious model. A researcher, however, would not be able to make this determination without comparing the HLVM to the MLFA.

Discussion
This methodological demonstration of MLFA to collective efficacy shows that use of either simple aggregation methods, in the form of derived variables, or single-level factor analyses, may not be the best way to construct contextual-level variables from individual-level data. We arrived at this conclusion based on three sets of results. First, we found that ICC values were not the same for every item; some items showed quite high neighborhood-  level variation and others showed very little. The lack of uniformity in between-neighborhood variation across these items suggests neighborhood context may have differing levels of salience across this set of items and that not all items should be treated equally in terms of their importance to understanding neighborhoods. Second, the correlation structure of the items was different across the individual (within) and neighborhood (between) levels. Specifically, the correlation among items was much higher at the between level than the within. Moreover, how the items related to each other also differed across levels; some items had high correlations at one level and modest correlations at the other. These findings provided an initial sign that there may be different factor structures at the two levels of analysis.
Third, when we ran the MLFA, we found that the best-fitting model was one that modeled collective efficacy as a two dimensional construct at the within level, consisting of the two latent constructs informal social control and social cohesion, and a one dimensional construct at the between level, consisting of collective efficacy. This two-factor within and one-factor between model was confirmed in the ML-CFA. Imposing an identical factor structure at both levels resulted in a worse-fitting model, particularly when we imposed a set of stricter constraints described in the original paper introducing collective efficacy [9]. While the stricter constraints may be reasonable and could be supported by the data in some cases, there may be instances, such as the case here, where the items were not all equally good indicators of collective efficacy and thus imposing equal factor loadings and equal residual variances constraints was not consistent with the observed data. We also found that the items performed differently in terms of their factor loadings at the within compared to between level. For example, the item "people in this neighborhood do not share the same values" did not load at the within level, but loaded at the between. Taken together, the results of the current study suggest that collective efficacy, and perhaps other social constructs, can have very different meanings at each level of analysis and are perhaps most appropriately studied at the neighborhood level as one overarching construct and not divided into its two dimensions, informal social control and social cohesion, as has been done in some prior studies (see for example [13,43]).
Our study has the following limitations. The measure of collective efficacy was not identical to the original measure [9]. It is possible our results would have been different had we used a different measure of collective efficacy. The number of neighborhoods in this study (n = 65) was also small relative to other studies. Moreover, our definition of neighborhoods was based on an administrative definition (i.e., Census tract), which may not adequately reflect meaningful geographic boundaries that represent distinct social experiences or cultures [44,45]. Though an imperfect measure to define neighborhoods, Census tracts are most commonly used in multilevel research in the United States [8].
Finally, the MLFA technique is, of course, not without its limitations. For example, it can be computationally intensive. Most software also only allow for two-level structures. In spite of these challenges, results of our analysis underscore the potential utility of MLFA and suggest that using other more easily implemented approaches, such as single-level factor analyses, may not be ideal. As we showed, the MFLA method revealed different latent factor structures at each level of analysis. Our results also demonstrated that imposing a simpler factor structure, with identical factor structures at each level, was not consistent with the data and resulted in a poorer-fitting model.