Statistics · Unit 7: Inference for Quantitative Data: Means · 14 min read · Updated 2026-05-11

Inference for a Mean Difference with Paired Data — AP Statistics

AP Statistics · Unit 7: Inference for Quantitative Data: Means · 14 min read

1. Paired Data and Study Design ★★☆☆☆ ⏱ 3 min

Paired data occurs when there is a natural one-to-one matching between observations in the two groups we compare. Two common scenarios produce paired data, and pairing eliminates between-pair variability to reduce standard error and make inference more powerful than independent samples.

**Repeated measures**: The same experimental unit is measured twice under two different conditions (e.g., heart rate before and after exercise)
**Matched pairs design**: Units are matched into pairs on shared confounding variables, then one unit per pair gets each treatment

Exam tip: When in doubt, ask: 'Does every observation in the first group have exactly one unique connected observation in the second group?' If yes, it is paired.

2. Conditions for Paired t-Inference ★★☆☆☆ ⏱ 3 min

All paired t-procedures require three conditions, all checked on the sample of differences (not the original two sets of measurements), analogous to one-sample t-procedures.

**Random**: Pairs are randomly selected from the population, or treatments are randomly assigned within pairs
**Normal/Large Sample**: The sampling distribution of $\bar{d}$ is approximately normal if $n \geq 30$, or $n < 30$ with no strong skewness/outliers in the difference distribution
**Independent**: Differences are independent; when sampling without replacement, the 10% condition ($N \geq 10n$) applies

Exam tip: Always check conditions on the differences, not the original two groups. The AP exam deducts points for checking conditions on unpaired original data.

3. Paired t-Test for a Population Mean Difference ★★★☆☆ ⏱ 4 min

A paired t-test is used to test a claim about the true population mean difference $\mu_d$. Almost always, the null hypothesis is $H_0: \mu_d = 0$, as we test whether there is any difference between paired measurements. The test statistic is:

t = \frac{\bar{d} - \mu_{d0}}{s_d / \sqrt{n}}

Where $\mu_{d0}$ is the null hypothesized difference (almost always 0), and degrees of freedom are $df = n - 1$.

📐 Worked Example

A coffee roaster tests whether a new roasting method increases caffeine content. He roasts 10 batches, splits each into two portions, roasts one with the old method and one with the new. He calculates $d_i = \text{new caffeine} - \text{old caffeine}$, getting $\bar{d} = 12 \text{ mg}/100\text{g}$, $s_d = 20 \text{ mg}/100\text{g}$. Conduct a paired t-test at $\alpha = 0.05$.

1. State hypotheses: Define $\mu_d =$ true mean difference in caffeine content (new minus old). $H_0: \mu_d = 0$ (no difference), $H_a: \mu_d > 0$ (new method has higher caffeine).
2. With conditions confirmed, calculate the test statistic:
$t = \frac{12 - 0}{20 / \sqrt{10}} \approx 1.90$
3. Degrees of freedom $df = 9$, so the one-sided p-value is between 0.04 and 0.05.
4. Conclusion: Since $p < 0.05$, we reject the null hypothesis. There is convincing evidence at the $\alpha=0.05$ level that the new roasting method increases mean caffeine content.

Exam tip: Always define $\mu_d$ in context, specifying which measurement is subtracted from which, before writing hypotheses. This is required for full points.

4. Paired t-Confidence Interval for a Population Mean Difference ★★★☆☆ ⏱ 3 min

A paired t-confidence interval estimates the true value of $\mu_d$, quantifying the size of the mean difference rather than just testing if it is non-zero. If the interval does not contain 0, we reject $H_0: \mu_d = 0$ at significance level $\alpha = 1-C$ for a two-sided test, where $C$ is the confidence level. The formula is:

\bar{d} \pm t^* \times \frac{s_d}{\sqrt{n}}

Where $t^*$ is the critical t-value for confidence level $C$ and $df = n-1$, and the second term is the margin of error.

Exam tip: When interpreting the interval, always specify the direction of the difference (which group minus which) in context. Ambiguous interpretations lose points.

5. Concept Check ★★☆☆☆ ⏱ 1 min

Common Pitfalls

Why: Students see two groups of measurements and automatically select the two-sample procedure without checking for pairing

Why: Students forget inference is conducted on the differences, not the original measurements

Why: While $\bar{x}_1 - \bar{x}_2 = \bar{d}$ is true, using original standard deviations to calculate standard error is incorrect

Why: Students confuse the t/z distinction, incorrectly assuming large n justifies z-procedures

Why: Students confuse the distribution of individual differences with the confidence interval for the mean difference

Why: Students stop after rejecting/failing to reject $H_0$ and do not answer the original research question

Quick Reference Cheatsheet

← Back to topic

Stuck on a specific question?
Snap a photo or paste your problem — Ollie (our AI tutor) walks through it step-by-step with diagrams.
Try Ollie free →