Statistics · Exploring Two-Variable Data · 14 min read · Updated 2026-05-11

Transforming to Achieve Linearity — AP Statistics

AP Statistics · Exploring Two-Variable Data · 14 min read

1. What Is Transforming to Achieve Linearity? ★★☆☆☆ ⏱ 3 min

Many real-world two-variable relationships (such as bacterial growth, radioactive decay, and car stopping distance) follow non-linear patterns, so a straight least-squares regression line will produce biased predictions and systematic error in residuals.

Transforming to achieve linearity is the process of applying a mathematical re-expression (usually a logarithmic or power transformation) to one or both variables to convert a curved relationship into a linear one, allowing us to use existing simple linear regression tools to fit and analyze the model.

2. Log Transformation for Exponential Models ★★★☆☆ ⏱ 4 min

This relationship is always curved on the original $x$-$y$ scale, so we linearize it by taking the logarithm of both sides:

y = \ln y = \ln\left(ab^x\right) = \ln a + x \ln b

If we let $y' = \ln y$, $A = \ln a$, and $B = \ln b$, we get the standard linear form:

y' = A + Bx

We fit a least-squares regression line to the transformed $(x, \ln y)$ data to get estimates of $A$ and $B$, then back-transform to recover $a = e^A$ and $b = e^B$ for the original exponential model. Residual plots on the transformed data confirm if linearization worked: random scatter around zero means the model is appropriate.

Exam tip: If the problem uses base-10 (common) logs instead of natural logs, back-transform with base 10, not $e$. If $\log_{10} y = A + Bx$, then $\widehat{y} = 10^A (10^B)^x$ — always match the log base when exponentiating.

3. Power Transformations for Power Function Models ★★★☆☆ ⏱ 4 min

Like exponential models, power models are curved on the original scale, but linearization requires transforming both variables. Take the logarithm of both sides to get:

\ln y = \ln\left(ax^p\right) = \ln a + p \ln x

Let $y' = \ln y$ and $x' = \ln x$, so this becomes the linear equation:

y' = \ln a + p x'

We fit least-squares regression to the transformed $(\ln x, \ln y)$ data, get an intercept $A = \ln a$ and slope equal to the power $p$, then back-transform $a = e^A$ to get the original power model $\widehat{y} = a x^p$.

📐 Worked Example

A civil engineer measures the stopping distance $y$ (meters) of a car moving at speed $x$ (km/h) and fits a linear regression to transformed $(\ln x, \ln y)$ data, resulting in $\widehat{\ln y} = -1.1 + 2.05 \ln x$. Write the power model for $y$ and interpret the slope of the transformed model.

Recall the linearized form of a power model is $\ln y = \ln a + p \ln x$, where $p$ is the power in the original model.
Match coefficients: $\ln a = -1.1$, $p = 2.05$. Back-transform to get $a = e^{-1.1} \approx 0.3329$.
Write the original power model:
$\widehat{y} = 0.33 x^{2.05}$
Interpret the slope: A 1-unit increase in $\ln x$ corresponds to a 2.05-unit increase in $\ln y$, which means a 10% increase in speed $x$ corresponds to approximately a $2.05 \times 10\% = 20.5\%$ increase in predicted stopping distance $y$.

Exam tip: The most common MCQ error is mixing up transformations: exponential models only need $y$ transformed, power models need both $x$ and $y$ transformed. Memorize the derivation, not just the rule, to avoid this mistake.

4. Residual Analysis and Model Selection ★★★☆☆ ⏱ 3 min

When you start with a curved scatterplot of $y$ vs $x$, you will often test multiple transformations to find one that produces a linear relationship. The primary tool for selecting the appropriate transformation is residual analysis: after fitting a linear regression to the transformed data, you plot the residuals from the transformed regression against the explanatory variable $x$.

If the residual plot has no systematic curved pattern (residuals are randomly scattered around zero), the transformation successfully linearized the relationship, and the model is appropriate. If there is still a visible curve, you need to test a different transformation.

📐 Worked Example

A materials scientist testing the relationship between object volume $x$ (cm³) and mass $y$ (g) produces three residual plots after testing different transformations: (1) Residuals for untransformed $y$ vs $x$: clear U-shaped curve; (2) Residuals for $\ln y$ vs $\ln x$: randomly scattered around zero with no pattern; (3) Residuals for $\ln y$ vs $x$: clear upward curved trend. Which transformation is appropriate, and what model does this correspond to?

The goal of transformation is to achieve linearity, which is confirmed by a residual plot with no systematic pattern.
Eliminate the untransformed model and the $\ln y$ vs $x$ model, because both residual plots have clear curved patterns that show the relationship is still non-linear after transformation.
The $\ln y$ vs $\ln x$ transformation produces random residual scatter, so it is the appropriate choice.
A linear relationship between $\ln y$ and $\ln x$ corresponds to a power function model $y = ax^p$ for the original data.

Exam tip: Always explicitly reference the residual pattern in your justification: saying "no curved pattern so the model is appropriate" will get you full credit, while just saying "the model fits better" will not.

5. Concept Check: AP-Style Practice ★★★★☆ ⏱ 3 min

Common Pitfalls

Why: Students memorize "log works for linearization" and forget which variable to transform based on model form.

Why: Students assume all log transformations use natural log and don't check the problem's given transformation.

Why: Students stop after calculating the prediction from the transformed regression and don't circle back to the question's request.

Why: Students are used to using $R^2$ for model comparison, but it is not comparable across different transformation scales.

Why: Students forget the response variable was transformed, so the slope is on the transformed scale.

Why: Students mix up which variable to use for residual plots.

Quick Reference Cheatsheet

← Back to topic

Stuck on a specific question?
Snap a photo or paste your problem — Ollie (our AI tutor) walks through it step-by-step with diagrams.
Try Ollie free →