AP Statistics · AP Statistics 2019 CED · 45 min read
1. Population vs. Sample Slope★★☆☆☆⏱ 10 min
We almost never have data for an entire population of $(x,y)$ pairs. We collect a random sample, calculate the sample slope $b_1$, and use it to make inferences about the unknown population slope $\beta_1$. We account for sampling variability in $b_1$ to draw conclusions about the population.
2. Conditions for Inference for Slope★★★☆☆⏱ 15 min
Inference for slope relies on four core conditions that must be checked before any analysis. These conditions ensure that the sampling distribution of the sample slope follows the t-distribution we use for inference.
**Linear**: The true relationship between $x$ and $y$ is linear (check for no curvature in a residual plot).
**Independent**: Observations are independent of each other (check $n < 10\%$ of the population when sampling without replacement).
**Normal**: Residuals are approximately normally distributed around the regression line (check a normal probability plot of residuals or no strong skew/outliers).
**Equal Variance**: The spread of residuals is constant across all values of $x$ (check for no fanning in a residual plot).
3. Hypotheses for Slope Inference★★★☆☆⏱ 12 min
We almost always test the null hypothesis that there is *no linear relationship* between $x$ and $y$ in the population. This corresponds to a population slope of zero. Alternative hypotheses can be two-sided (no direction specified) or one-sided (directional claim).
4. Sampling Distribution of the Sample Slope★★★★☆⏱ 15 min
When all four LINE conditions are met, the sampling distribution of the sample slope $b_1$ is approximately normal, centered at the true population slope $\beta_1$. The standard deviation of this distribution is estimated by the standard error of the slope $SE_{b_1}$, which is almost always calculated by statistical software.
Common Pitfalls
Why: The regression line models the mean value of $y$ for a given $x$, not individual values of $y$.
Why: $b_1$ is a known value calculated from the sample; we do not test hypotheses about known values. Hypotheses are always about unknown population parameters.
Why: A zero slope only indicates there is no *linear* relationship between $x$ and $y$. A strong curved relationship can still exist.
Why: If your sample is more than 10% of the population, observations are not independent, and standard error estimates will be inaccurate.