Statistics · CED Unit 8: Inference for Categorical Data · 14 min read · Updated 2026-05-11
Chi-Square Test for Independence — AP Statistics
AP Statistics · CED Unit 8: Inference for Categorical Data · 14 min read
1. What Is Chi-Square Test for Independence?★★☆☆☆⏱ 3 min
The chi-square test for independence is a hypothesis testing procedure used to test whether two categorical variables measured on the same sample of individuals are associated (dependent) or independent in the larger population. This topic makes up 6-12% of the total AP Statistics exam score, and often appears as a full 4-5 point free-response question.
Unlike the chi-square goodness-of-fit test, which tests a hypothesized distribution for a single categorical variable, this test addresses questions of association between two variables, making it widely used in social and life sciences research.
2. Hypotheses, Conditions, and Setup★★☆☆☆⏱ 4 min
Before any calculations, you must correctly formulate hypotheses, set up your contingency table of observed counts, and verify all conditions. This step earns as many points on the AP exam as your final calculation, so it is critical to get right.
**Null Hypothesis ($H_0$)**: The two categorical variables are independent (no association) in the population of interest.
**Alternative Hypothesis ($H_a$)**: The two categorical variables are dependent (there is an association) in the population of interest.
**Random**: Data comes from a random sample or randomized experiment.
**Independence**: Individual observations are independent. When sampling without replacement, this requires the 10% condition: sample size < 10% of the total population.
**Large Counts**: All expected cell counts are at least 5 (per AP exam guidelines).
3. Expected Counts and Test Statistic Calculation★★★☆☆⏱ 4 min
Once hypotheses and conditions are confirmed, the next step is to calculate expected cell counts (the count you would expect in each cell if the null hypothesis of independence were true) and the chi-square test statistic.
The formula for expected count is:
E = \frac{(\text{Row Total}) \times (\text{Column Total})}{\text{Grand Total}}
The chi-square test statistic measures how far observed counts are from expected counts. Any deviation from expected (whether observed is higher or lower than expected) increases the test statistic because we square the difference. The formula is:
\chi^2 = \sum \frac{(O - E)^2}{E}
Degrees of freedom for the test are calculated as:
df = (r - 1)(c - 1)
where $r$ is the number of rows and $c$ is the number of columns in the contingency table.
4. P-Value and Contextual Conclusion★★★☆☆⏱ 3 min
The final step of the test is calculating the p-value and writing a valid conclusion in context. Because any deviation from expected counts increases the $\chi^2$ test statistic, all chi-square tests for independence are right-tailed. The p-value is the probability of observing a $\chi^2$ statistic as large or larger than the one you calculated, assuming the null hypothesis is true.
Compare your p-value to the significance level $\alpha$ (almost always 0.05, unless stated otherwise):
If $p < \alpha$: Reject $H_0$, there is sufficient evidence of an association.
If $p \geq \alpha$: Fail to reject $H_0$, there is not sufficient evidence of an association.
5. Concept Check★★★☆☆⏱ 4 min
Common Pitfalls
Why: Inference generalizes results from the sample to the larger population; you already know the counts in your sample
Why: Students confuse the chi-square independence degrees of freedom formula with that for t-tests or chi-square goodness-of-fit
Why: Students focus on the unique large counts condition for chi-square tests and miss the general 10% condition required for all sampling without replacement inference
Why: Students generalize that all hypothesis tests can be two-tailed, forgetting the structure of the chi-square test statistic
Why: Students confuse association and causation, and most chi-square tests for independence use observational data