The chi-square test can be used to test a variety of sizes of contingency tables, as well as more than one type of null and alternative hypotheses. This guide focuses on contingency tables that are greater than 2 x 2, which are often referred to as r x c contingency tables, and tests whether two variables measured at the nominal level are independent (i.e., whether there is an association between the two variables). Most commonly this test is called the chi-square test of independence, but it is also known as the chi-square test for association. Whilst it is also possible to perform the chi-square test of independence on ordinal variables, you will lose the ordered nature of the data by doing so and there will most likely be more suitable tests to run (see our Statistical Test Selector). In order to make the correct inferences from a chi-square test of independence you will need to have undertaken a naturalistic study design.
Note: If you are interested in understanding (and modelling) associations between three or more categorical variables you should consider loglinear analysis instead of the chi-square test of independence.
For example, you could use a chi-square test of independence to determine whether there is an association between the political party a person votes for in the United Kingdom and their housing tenure (i.e., your two nominal variables would be “political affiliation”, which has five categories – “Conservatives”, “Labour”, “UKIP”, the “Liberal Democrats” and “Green Party” – and “housing tenure”, which also has four categories: “Own home”, “Mortgaged home”, “Private renter” and “Social housing renter”. If there is an association (positive or negative), you can also determine the strength/magnitude of this association. Alternately, you could use a chi-square test of independence to determine whether there is an association between the preferred brand of luxury car and the country of buyers (i.e., your two nominal variables would be “luxury car brand preference”, which has five categories – Audi, BMW, Land Rover, Mercedes and Porsche – and buyer country, which has five categories: “United Kingdom”, “France”, “Germany”, “Italy” and “Spain”. Again, if there is an association (positive or negative), you can also determine the strength/magnitude of this association.
In order to run a chi-square test of independence, there are four assumptions that need to be considered. The first three assumptions relate to how you measured your variables, whilst the fourth assumption relates to how the data fits the chi-square test of independence model. These assumptions are:
- Assumption #1: You have two nominal variables. Examples of nominal variables include ethnicity (e.g., three groups: Caucasian, African American and Hispanic), seasons (e.g., four groups: “spring”, “summer”, autumn” and “winter”), profession (e.g., five groups: surgeon, doctor, nurse, dentist, therapist), and so forth. If you need more information about variables and their different types of measurement, see our Types of Variables guide.
Explanation: The “groups” of a categorical variable are also referred to as “categories” or “levels“, but the term “levels” is usually reserved for groups that have an order (e.g., fitness level, with three levels: “low”, “moderate” and “high”). However, these three terms – “groups”, “categories” and “levels” – can be used interchangeably. We will mostly refer to them as categories, but in some cases, we will refer to them as groups or levels. The only reason we do this is for clarity (i.e., it sometimes sounds more appropriate in a sentence to use groups or levels instead of categories, and vice versa).
Important: Whilst a chi-square test of independence can be used with ordinal variables, it is strictly a test for nominal variables. Therefore, even though you can use a chi-square test of independence with ordinal variables, the chi-square test of independence will treat them as nominal variables, and you will lose their ordered nature. If you have ordinal variables and want to keep their ordered nature, there are alternative statistical tests you can use, such as Kendall’s tau, Spearman’s correlation, and linear-by-linear association, amongst others. See the Associations route of our Statistical Test Selector to help you choose the appropriate test.
Note: If you have three or more categorical variables rather than just two categorical variables, a loglinear analysis can be used instead of a chi-square test of independence.
If your study fails this assumption, you will need to use another statistical test instead of a chi-square test of independence (you can use our Statistical Test Selector to find the appropriate statistical test).
- Assumption #2: You should have independence of observations, which means that there is no relationship between the observations in each group of each variables or between the groups themselves. Indeed, an important distinction is made in statistics when comparing values from either different individuals or from the same individuals. Independent groups (in a chi-square test of independence) are groups where there is no relationship between the participants in either of the groups. Most often, this occurs simply by having different participants in each group.
For example, if you split a group of individuals into two groups based on their gender (i.e., a male group and a female group), no one in the female group can be in the male group and vice versa. As another example, you might randomly assign participants to either a control trial or an intervention trial. Again, no participant can be in both the control group and the intervention group. This will be true of any two independent groups you form (i.e., a participant cannot be a member of both groups). In actual fact, the ‘no relationship’ part extends a little further and requires that participants in both groups are considered unrelated, not just different people; for example, participants might be considered related if they are husband and wife, or twins. Furthermore, participants in Group A cannot influence any of the participants in Group B, and vice versa.
- Assumptions #3: The null hypothesis being tested using the chi-square test of independence in this guide cannot be used with all types of sampling (i.e., study design). This is explained in more detail in the section, Sampling and the chi-square test of independence, further down this page.
- Assumptions #4: As will be discussed further in our Assumptions section, the chi-square test of independence must also meet one assumption that relates to the nature of your data in order to provide a valid result: all cells should have expected counts greater than or equal to five.
SPSS Statistics will have generated all the information you need to report the results of the chi-square test of independence, Cramer’s V to determine the strength/magnitude of any association, and where appropriate, adjusted standardized residuals to determine which cells deviate from independence. In this section, we explain how to interpret these results. We also show how to write up your results as you work through the section.
On the following three pages we interpret the results as follows:
- Sample characteristics and crosstabulation: We will check the characteristics of the sample you have just tested, and (b) discuss how to interpret the crosstabulation and observed and expected frequencies for each cell of the design. This includes interpreting the observed counts, how the observed counts can be viewed as percentages and proportions, and the usefulness of comparing the observed and expected counts before interpreting the chi-square test of independence result.
- Chi-square test of independence and strength of association: If you have an adequate sample size to run and interpret the chi-square test of independence, we can show you how to do this. We will explain: (a) whether you have a statistically significant chi-square test of independence result; (b) on this basis, whether you should reject the null hypothesis and accept the alternative hypothesis, or fail to reject the null hypothesis and reject the alternative hypothesis; and (c) the strength/magnitude of any association using Cramer’s V, which is a measure that provides an estimate of the strength of the association between your two variables.
- Post hoc testing using adjusted standardized residuals: If the chi-square test of independence was statistically significant, we will show you how to determine which cells of your design deviate from independence using adjusted standardized residuals.