| Study Guides
Statistics · Exploring Two-Variable Data · 14 min read · Updated 2026-05-11

Transforming to Achieve Linearity — AP Statistics

AP Statistics · Exploring Two-Variable Data · 14 min read

1. What Is Transforming to Achieve Linearity? ★★☆☆☆ ⏱ 3 min

Many real-world two-variable relationships (such as bacterial growth, radioactive decay, and car stopping distance) follow non-linear patterns, so a straight least-squares regression line will produce biased predictions and systematic error in residuals.

Transforming to achieve linearity is the process of applying a mathematical re-expression (usually a logarithmic or power transformation) to one or both variables to convert a curved relationship into a linear one, allowing us to use existing simple linear regression tools to fit and analyze the model.

2. Log Transformation for Exponential Models ★★★☆☆ ⏱ 4 min

This relationship is always curved on the original $x$-$y$ scale, so we linearize it by taking the logarithm of both sides:

y = \ln y = \ln\left(ab^x\right) = \ln a + x \ln b

If we let $y' = \ln y$, $A = \ln a$, and $B = \ln b$, we get the standard linear form:

y' = A + Bx

We fit a least-squares regression line to the transformed $(x, \ln y)$ data to get estimates of $A$ and $B$, then back-transform to recover $a = e^A$ and $b = e^B$ for the original exponential model. Residual plots on the transformed data confirm if linearization worked: random scatter around zero means the model is appropriate.

Exam tip: If the problem uses base-10 (common) logs instead of natural logs, back-transform with base 10, not $e$. If $\log_{10} y = A + Bx$, then $\widehat{y} = 10^A (10^B)^x$ — always match the log base when exponentiating.

3. Power Transformations for Power Function Models ★★★☆☆ ⏱ 4 min

Like exponential models, power models are curved on the original scale, but linearization requires transforming both variables. Take the logarithm of both sides to get:

\ln y = \ln\left(ax^p\right) = \ln a + p \ln x

Let $y' = \ln y$ and $x' = \ln x$, so this becomes the linear equation:

y' = \ln a + p x'

We fit least-squares regression to the transformed $(\ln x, \ln y)$ data, get an intercept $A = \ln a$ and slope equal to the power $p$, then back-transform $a = e^A$ to get the original power model $\widehat{y} = a x^p$.

Exam tip: The most common MCQ error is mixing up transformations: exponential models only need $y$ transformed, power models need both $x$ and $y$ transformed. Memorize the derivation, not just the rule, to avoid this mistake.

4. Residual Analysis and Model Selection ★★★☆☆ ⏱ 3 min

When you start with a curved scatterplot of $y$ vs $x$, you will often test multiple transformations to find one that produces a linear relationship. The primary tool for selecting the appropriate transformation is residual analysis: after fitting a linear regression to the transformed data, you plot the residuals from the transformed regression against the explanatory variable $x$.

If the residual plot has no systematic curved pattern (residuals are randomly scattered around zero), the transformation successfully linearized the relationship, and the model is appropriate. If there is still a visible curve, you need to test a different transformation.

Exam tip: Always explicitly reference the residual pattern in your justification: saying "no curved pattern so the model is appropriate" will get you full credit, while just saying "the model fits better" will not.

5. Concept Check: AP-Style Practice ★★★★☆ ⏱ 3 min

Common Pitfalls

Why: Students memorize "log works for linearization" and forget which variable to transform based on model form.

Why: Students assume all log transformations use natural log and don't check the problem's given transformation.

Why: Students stop after calculating the prediction from the transformed regression and don't circle back to the question's request.

Why: Students are used to using $R^2$ for model comparison, but it is not comparable across different transformation scales.

Why: Students forget the response variable was transformed, so the slope is on the transformed scale.

Why: Students mix up which variable to use for residual plots.

Quick Reference Cheatsheet

← Back to topic

Stuck on a specific question?
Snap a photo or paste your problem — Ollie (our AI tutor) walks through it step-by-step with diagrams.
Try Ollie free →