Inference for Quantitative Data: Slopes — AP Statistics
1. What is Inference for Regression Slopes? ★★☆☆☆ ⏱ 3 min
Inference for regression slopes is the set of statistical methods used to draw conclusions about the true linear relationship between two quantitative variables in a population, using data from a random sample. The slope of a sample least squares regression line ($b$, our point estimate) is used to make claims about the unknown true slope of the population regression line ($\beta$). This topic makes up 12-15% of your total AP Statistics exam score, and is frequently tested in the free-response section, often as part of the final investigative task (FRQ 6).
Exam tip: Always include context of your variables when interpreting slopes and intervals; exam graders heavily penalize generic, out-of-context answers.
2. Sampling Distribution of the Sample Slope $b$ ★★☆☆☆ ⏱ 4 min
If inference conditions are met, this distribution has three key properties:
- **Unbiased center**: The mean $\mu_b = \beta$, so $b$ is an unbiased estimator of the true population slope
- **True standard deviation**: $\sigma_b = \frac{\sigma}{\sigma_x \sqrt{n-1}}$, where $\sigma$ = population residual standard deviation, $\sigma_x$ = population standard deviation of $x$, $n$ = sample size
- **Shape**: Approximately normally distributed when conditions are satisfied
In practice, we almost never know population parameters, so we calculate the *standard error of the slope* (estimated spread of the sampling distribution) using sample statistics:
SE_b = \frac{s}{s_x \sqrt{n-1}}
Where $s = \sqrt{\frac{\sum (y_i - \hat{y}_i)^2}{n-2}}$ is the sample residual standard error, and $s_x$ is the sample standard deviation of the explanatory variable. Degrees of freedom for all slope inference is $df = n-2$, because we estimate two parameters (intercept and slope) when fitting the regression line.
3. Confidence Intervals for the True Slope $\beta$ ★★★☆☆ ⏱ 5 min
A confidence interval for the true population slope $\beta$ gives a range of plausible values for the average change in the response variable $y$ for every 1-unit increase in the explanatory variable $x$. We use a $t$-distribution for this interval, because we use the estimated standard error $SE_b$ instead of the unknown true standard deviation $\sigma_b$.
For full credit on AP exams, you must interpret the interval in context. The standard correct interpretation is: *We are $C\%$ confident that the true average change in [response variable] for each 1-unit increase in [explanatory variable] is between [lower bound] and [upper bound] units.*
Exam tip: Examiners will deduct marks if you fail to reference the context of your variables, or misinterpret the interval as applying to individual observations.
4. Hypothesis Tests for Slope Significance ★★★☆☆ ⏱ 4 min
Hypothesis tests for slope are almost always used to test if there is a statistically significant linear relationship between two variables. The most common hypotheses are:
- Null hypothesis: $H_0: \beta = 0$ (no linear relationship between $x$ and $y$ in the population)
- Alternative hypothesis: $H_a: \beta \neq 0$ (two-tailed, default; there is a linear relationship between $x$ and $y$)
One-tailed alternatives ($H_a: \beta > 0$ or $H_a: \beta < 0$) are only used if the problem explicitly states an expected direction for the relationship. The test statistic is a $t$-score:
t = \frac{b - \beta_0}{SE_b}
Where $\beta_0$ is the hypothesized slope (almost always 0), and the $p$-value is calculated with $df = n-2$. If $p < \alpha$, we reject the null hypothesis and conclude there is convincing evidence of a linear relationship.
5. LINE Conditions for Valid Inference ★★★☆☆ ⏱ 5 min
All slope inference is only valid if the four LINE conditions are satisfied. On AP free-response questions, you must explicitly check each condition with supporting evidence (usually from plots) to earn full marks. The four conditions are:
- **Linearity**: The true relationship between $x$ and $y$ is linear. Check: residual plot shows random scatter around 0, no curved pattern.
- **Independence**: Individual observations are independent. Check: random sampling/assignment, 10% condition if sampling without replacement.
- **Normality**: Residuals are normally distributed around 0. Check: normal probability plot of residuals is roughly linear, no extreme outliers.
- **Equal Variance (homoscedasticity)**: Spread of residuals is constant across all $x$. Check: residual plot has consistent vertical spread, no fanning/funnel patterns.
Exam tip: Always link each condition to the evidence you use to check it. Just memorizing and listing the conditions will not earn full credit.
Common Pitfalls
Why: Students confuse slope inference with proportion inference, which uses $z$-scores
Why: Students mix up slope intervals for the population mean and prediction intervals for individual values
Why: Students memorize the conditions but forget the exam requires showing you know how to verify them
Why: Students confuse statistically significant association with causation
Why: Students carry over the degrees of freedom rule from one-sample mean inference