Principal components analysis (PCA, for short) is a variable-reduction technique that shares many similarities to exploratory factor analysis. Its aim is to reduce a larger set of variables into a smaller set of ‘artificial’ variables (called principal components) that account for most of the variance in the original variables. Although principal components analysis is conceptually different from factor analysis, it is often used interchangeably with factor analysis in practice and is included within the Factor procedure in SPSS Statistics.
In order to run a principal components analysis, the following four assumptions must be met. The first assumption relates to your choice of study design, whilst the remaining three assumptions reflect the nature of your data:
- Assumption #1: You have multiple variables that are measured at the continuous level (although ordinal data is very frequently used). Examples of continuous variables include revision time (measured in hours), intelligence (measured using IQ score), exam performance (measured from 0 to 100), weight (measured in kg), and so forth. Examples of ordinal variables include Likert items (e.g., a 7-point scale from “strongly agree” through to “strongly disagree”), amongst other ways of ranking categories (e.g., a 5-point scale explaining how much a customer liked a product, ranging from “Not very much” to “Yes, a lot”).
Note: Principal components analysis is a variable reduction technique and does not make a distinction between independent and dependent variables.
- Assumption #2: There should be a linear relationship between all variables. The first assumption, that variables are linearly related, needs to be tested before you run a principal components analysis. Although this can be tested using a matrix scatterplot this is often considered overkill as the scatterplot can sometimes have over 500 linear relationships. As such, it is suggested that you randomly select just a few possible relationships between variables and test these. Non-linear relationships can be transformed. The reason for this assumption is that a principal components analysis is based on Pearson correlation coefficients and, as such, there needs to be a linear relationship between the variables. In actual practice, this assumption is somewhat relaxed (even if it shouldn’t be) with the use of ordinal data for variables.
- Assumption #3: There should be no outliers. The second assumption of no outliers is important as these can have a disproportionate influence on the results. SPSS Statistics recommends determining outliers as component scores greater than 3 standard deviations away from the mean. As component scores are the last to be calculated in a principal components analysis, outliers are considered last.
- Assumption #4: There should be large sample sizes for a principal components analysis to produce a reliable result. Many different rules-of-thumb have been proposed that differ mostly by either using absolute sample size numbers or a multiple of the number of variables in your sample. Generally speaking, a minimum of 150 cases or 5 to 10 cases per variable have been recommended as minimum sample sizes.
SPSS Statistics will have generated a number of tables and graphs that contain most of the information you need to report the results of a principal components analysis.
The output generated by SPSS Statistics is quite extensive and can provide a lot of information about your analysis. However, you will often find that the analysis is not yet complete and more re-runs of the analysis will have to take place before you get to your final solution. We will focus on: (a) communalities; (b) extracting and retaining components; and (c) forced factor extraction.
- Communalities: The communality is the proportion of each variable’s variance that is accounted for by the principal components analysis and can also be expressed as a percentage.
- Extracting and retaining components: A principal components analysis will produce as many components as there are variables. However, the purpose of principal components analysis is to explain as much of the variance in your variables as possible using as few components as possible. After you have extracted your components, there are four major criteria that can help you decide on the number of components to retain: (a) the eigenvalue-one criterion, (b) the proportion of total variance accounted for, (c) the scree plot test, and (d) the interpretability criterion. All except for the first criterion will require some degree of subjective analysis.
- Forced factor extraction: When extracting components as part of your principal components analysis, SPSS Statistics does this based on the eigenvalue-one criterion. However, it is possible to instruct SPSS Statistics how many components you want to retain.