The Pearson product-moment correlation is used to determine the strength and direction of a linear relationship between two continuous variables. More specifically, the test generates a coefficient called the Pearson correlation coefficient, denoted as r (i.e., the italic lowercase letter r), and it is this coefficient that measures the strength and direction of a linear relationship between two continuous variables. Its value can range from -1 for a perfect negative linear relationship to +1 for a perfect positive linear relationship. A value of 0 (zero) indicates no relationship between two variables. This test is also known by its shorter titles, the Pearson correlation or Pearson’s correlation, which are often used interchangeably.
For example, you could use a Pearson’s correlation to determine the strength and direction of a linear relationship between salaries, measured in US dollars, and length of employment in a firm, measured in days (i.e., your two continuous variables would be “salary” and “length of employment”). You could also use a Pearson’s correlation to determine the strength and direction of a linear relationship between reaction time, measured in milliseconds, and hand grip strength, measured in kilograms (i.e., your two continuous variables would be “reaction time” and “hand grip strength).
In order to run a Pearson’s correlation, there are five assumptions that need to be considered. The first two relate to your choice of study design and the measurements you chose to make, whilst the other three relate to how your data fits the Pearson correlation model. These assumptions are:
- Assumption #1: Your two variables should be measured on a continuous scale (i.e., they are measured at the interval or ratio level). Examples of continuous variables include revision time (measured in hours), intelligence (measured using IQ score), exam performance (measured from 0 to 100), weight (measured in kg), and so forth. See our Types of Variable guide for more information, if needed.
- Assumption #2: Your two continuous variables should be paired, which means that each case (e.g., each participant) has two values: one for each variable. For example, imagine that you had collected the revision times (measured in hours) and exam results (measured from 0 to 100) from 100 randomly sampled students at a university (i.e., you have two continuous variables: “revision time” and “exam performance”). Each of the 100 students would have a value for revision time (e.g., “student #1” studied for “23 hours”) and exam performance (e.g., “student #1” scored “81 out of 100”). Therefore, you would have 100 paired values.
- Assumption #3: There needs to be a linear relationship between the two variables. The best way of checking this assumption is to plot a scatterplot and visually inspect the graph.
- Assumption #4: There should be no significant outliers. Outliers are data points within your sample that do not follow a similar pattern to the other data points. Pearson’s correlation coefficient, r, is sensitive to outliers, meaning that outliers can have an exaggerated influence on the value of r. This can lead to Pearson’s correlation coefficient not having a value that best represents the data as a whole. Therefore, it is best if there are no outliers or that they are kept to a minimum.
- Assumption #5: If you wish to run inferential statistics (null hypothesis significance testing), you also need to satisfy the assumption of bivariate normality. You will find that this is particularly difficult to test for and so a simpler method is more commonly used, which will be demonstrated in this guide.
SPSS Statistics will have generated just one table that contains all the information you need to report the results of a Pearson’s correlation. In this section, we explain how to interpret and write up the results from this table, including the correlation coefficient and the statistical significance of the correlation coefficient.
- The first step in interpreting your results is to understand the Pearson’s correlation coefficient value (rsor ρ), which is a measure of the strength and direction of the association between your two variables. The correlation coefficient can take values from +1 to -1, which indicates a perfect positive (+1) or negative (-1) association. A correlation coefficient of zero (0) indicates no association. Whilst there are no hard-and-fast rules for assigning strength of association to particular values, some general guidelines are provided by Cohen (1988). Broadly speaking, the closer the correlation coefficient is to zero, the weaker the association, and the closer the correlation coefficient is to +1 or -1, the stronger the association.
- The second step in interpreting your results is to determine whether the Pearson’s correlation coefficient value is statistically significant. This will allow you to determine whether you can accept or reject the null hypothesis. If you set α = 0.05 (i.e., p< .05), achieving a statistically significant Pearson’s correlation means that there is less than a 5% chance that the strength of the relationship you found (your correlation coefficient) happened by chance if the null hypothesis were true. We’ll explain how to interpret the two-tailed significance value (p-value) of the correlation coefficient to determine whether we can accept or reject the null hypothesis that there is no association between our two variables (time_tv and cholesterol) in the population.