| Study Guides
Statistics · Unit 1: Exploring One-Variable Data · 17 min read · Updated 2026-05-11

Exploring One-Variable Data — AP Statistics

AP Statistics · Unit 1: Exploring One-Variable Data · 17 min read

1. Variable Classification ★☆☆☆☆ ⏱ 3 min

All statistical analysis begins with correctly classifying your variable, as this determines every subsequent step in your analysis. An **individual** is the person, object, or event being measured, while a **variable** is any characteristic that varies across individuals.

  • **Discrete**: Countable whole number values (e.g., number of AP courses taken, number of siblings)
  • **Continuous**: Any value within a range, measured rather than counted (e.g., height, time to run 5K, weighted GPA)

Exam tip: Examiners regularly trick students with numeric categorical variables like jersey numbers or student ID numbers. If averaging the values produces a meaningless result, the variable is categorical.

2. Data Visualizations for One-Variable Data ★★☆☆☆ ⏱ 4 min

Once you classify your variable, you visualize its distribution to identify shape, clusters, gaps, and extreme values. The four most common visualizations for one-variable data are outlined below.

  • **Frequency tables**: List each value (or grouped bin for quantitative data) with its absolute (count) and relative (proportion/percentage) frequency.
  • **Dotplots**: Plot each observation as a dot above a number line, ideal for small datasets (<50 observations) because all original values are retained.
  • **Stemplots (stem-and-leaf plots)**: Split each value into a leading stem and trailing leaf, retain all original values and work well for medium datasets (50-200 observations). A score of 87 is written as `8 | 7`.
  • **Histograms**: Use adjacent bars to show frequency of values in equal-width bins for quantitative data. Ideal for large datasets (>200 observations) to clearly show distribution shape.

Exam tip: A common 1-point FRQ penalty is mixing up histograms and bar charts. Bar charts are for categorical variables with gaps between bars; histograms are for quantitative variables with no gaps between bins.

3. Measures of Center and Spread ★★☆☆☆ ⏱ 4 min

After visualizing the distribution, you summarize it with measures of center (typical value) and spread (variability around the center).

**Mean**: The arithmetic average, denoted $\bar{x}$ for samples and $\mu$ for populations. The sample mean formula is:

bar{x} = rac{sum_{i=1}^{n} x_i}{n}

**Median**: The middle value of a sorted dataset, equal to the 50th percentile. If $n$ is odd, it is the $\frac{n+1}{2}$th value; if even, it is the average of the two middle values.

**Standard Deviation (SD)**: Measures the average distance of observations from the mean, denoted $s$ for samples and $\sigma$ for populations. Sample SD uses $n-1$ in the denominator to correct for bias:

s = sqrt{\frac{sum_{i=1}^{n} (x_i - bar{x})^2}{n-1}}

**Interquartile Range (IQR)**: Measures the spread of the middle 50% of sorted data, calculated as $IQR = Q3 - Q1$, where $Q1$ is the 25th percentile (median of the lower half) and $Q3$ is the 75th percentile (median of the upper half).

Exam tip: For skewed distributions or distributions with outliers, use median and IQR, as they are not pulled by extreme values. For symmetric distributions with no outliers, use mean and SD, as they use all data points for greater precision.

4. Outliers and Resistance ★★☆☆☆ ⏱ 3 min

An **outlier** is a data point that falls far outside the overall pattern of a distribution. A statistic is **resistant** if its value is not heavily affected by extreme outliers. Median and IQR are resistant; mean and SD are not resistant.

Steps to apply the rule:

  1. Calculate lower fence: $Q1 - 1.5 \times IQR$
  2. Calculate upper fence: $Q3 + 1.5 \times IQR$
  3. Classify any observation less than lower fence or greater than upper fence as an outlier.

Exam tip: You will lose points on FRQs if you identify outliers by eye alone. Always explicitly calculate the 1.5*IQR fences and show the outlier falls outside the range to get full credit.

5. Z-Scores and Percentiles ★★★☆☆ ⏱ 3 min

Z-scores and percentiles allow you to compare individual observations to the rest of a distribution, even across datasets with different units or scales.

The formula for z-score is:

z = \frac{x - bar{x}}{s} quad text{(for samples)}, quad z = \frac{x - mu}{sigma} quad text{(for populations)}

A positive z-score means the observation is above the mean, a negative z-score means it is below the mean, and $z=0$ means the observation equals the mean.

Exam tip: Z-scores are the only valid way to compare observations from different distributions with different scales, which is a common multiple-choice question topic.

Common Pitfalls

Why: Students automatically assume any numeric value is quantitative, regardless of context

Why: Students default to the more familiar mean as a measure of center, regardless of distribution shape

Why: Students mix up the two visualization types and their use cases

Why: Students assume 'looking far away' is sufficient justification for FRQ answers

Why: Students forget the bias correction required for sample data, confusing sample and population standard deviation

Quick Reference Cheatsheet

← Back to topic

Stuck on a specific question?
Snap a photo or paste your problem — Ollie (our AI tutor) walks through it step-by-step with diagrams.
Try Ollie free →