Statistics · Collecting Data (12-15% of AP exam) · 14 min read · Updated 2026-05-11

Random Sampling and Data Collection — AP Statistics

AP Statistics · Collecting Data (12-15% of AP exam) · 14 min read

1. Core Concepts of Random Sampling ★★☆☆☆ ⏱ 3 min

Random sampling is the process of selecting a subset of observational units from a larger defined population of interest to collect data, with the core goal of producing a representative subset that allows valid statistical inferences about the whole population.

By contrast, a census collects data from every unit in the population, which is rarely feasible for large populations due to cost, time constraints, or destructive testing (e.g., testing battery life destroys the product). This topic makes up roughly half of the Collecting Data unit, which accounts for 12-15% of your total AP exam score.

N = \text{Total population size}, \quad n = \text{Sample size}

2. Common Probability Sampling Methods ★★★☆☆ ⏱ 4 min

The four most common probability sampling methods tested on AP Statistics are:

**Simple Random Sampling (SRS)**: Every possible sample of size $n$ has an equal chance of being selected, typically implemented with a random number generator or table.
**Stratified Random Sampling**: Population divided into non-overlapping *strata* (groups of units similar on a response-related variable); an SRS is taken from each stratum to reduce sampling error and guarantee subgroup representation.
**Cluster Sampling**: Population divided into non-overlapping *clusters* (each representative of the whole population); some clusters are randomly selected, and all units in selected clusters are sampled for logistical efficiency.
**Systematic Random Sampling**: Every $k$th unit is selected from a population list, after a random starting point between 1 and $k$; simpler than SRS when a sequential list exists.

📐 Worked Example

A high school principal wants to sample 100 students from the school's 1200 total students to assess student satisfaction with the cafeteria. The principal wants to ensure that freshmen, sophomores, juniors, and seniors are all proportionally represented in the sample. There are 300 freshmen, 320 sophomores, 290 juniors, and 290 seniors. What sampling method should the principal use, and how would they implement it?

Match the method to the goal. The principal needs guaranteed representation from each grade level, which is a variable that likely affects cafeteria satisfaction. This calls for stratified random sampling.
Define the strata. The four grade levels are the four non-overlapping strata: every student belongs to exactly one grade.
Calculate proportional sample sizes per stratum. The sampling fraction is:
$\frac{n}{N} = \frac{100}{1200} = \frac{1}{12}$
Calculating per stratum: Freshmen = $300*(1/12) = 25$, Sophomores = $320*(1/12) ≈ 27$, Juniors = $290*(1/12) ≈ 24$, Seniors = $290*(1/12) ≈ 24$, which sums to 100 total.
Implement sampling. Assign each student within each grade a unique number, then use a random number generator to select the calculated number of students from each grade to participate.

Exam tip: If an AP question asks which method is most appropriate, always match the method to the stated goal: if the goal is to ensure subgroup representation, it is stratified; if the goal is cost/logistical efficiency with representative groups, it is cluster. Don't confuse the two.

3. Non-Probability Sampling and Common Biases ★★★☆☆ ⏱ 3 min

Non-probability sampling methods do not assign known non-zero selection probabilities to all population units, so they almost always produce biased results. The most common non-probability method is convenience sampling, which selects easily accessible units.

**Selection (Undercoverage) Bias**: Some population groups are systematically excluded from the sampling frame (list of units available for selection), so they have no chance of being selected.
**Nonresponse Bias**: Selected units refuse to participate or cannot be contacted, and nonrespondents differ systematically from respondents on the variable of interest.
**Response Bias**: Participants give inaccurate responses, usually due to social desirability, leading question wording, or recall error.

📐 Worked Example

A campus radio station wants to survey students about whether they support a 10% increase in student activity fees to fund station upgrades. The station posts a link to the poll on its website, asking listeners to click to vote. Identify what type(s) of bias are most likely present, and explain their effect.

First, selection bias is present: only students who listen to the radio station are aware of the poll, so students who never listen are excluded. Students who do not listen are far less likely to support the fee increase, so they are underrepresented.
Voluntary response bias (a form of selection/nonresponse bias) is also present: students who care strongly about the issue (most often supporters of the station who want upgrades) are much more likely to take the time to vote than neutral or opposing students.
The net effect is that the poll will systematically overestimate the level of support for the fee increase compared to the true value for the entire student population.

Exam tip: AP FRQs require you to explain bias in context, not just name it. Always add one sentence explaining whether the sample estimate will be too high or too low relative to the true population value to earn full credit.

4. Key Comparisons of Sampling Methods ★★★★☆ ⏱ 3 min

AP Statistics frequently asks students to distinguish between similar sampling methods, most commonly stratified vs. cluster sampling, which are often confused because both divide the population into non-overlapping groups.

METHODS COMPARED

The core difference between the two methods is summarized below:

Stratified Random Sampling

Groups (strata) are constructed so units *within a stratum are similar* on the variable of interest. You sample from every stratum.

Cluster Sampling

Groups (clusters) are constructed so each cluster is representative of the whole population (heterogeneous within clusters). You only sample from randomly selected clusters.

Another common comparison is SRS vs. systematic sampling: systematic sampling is simpler to implement, but will produce bias if there is a repeating periodic pattern in the population list that aligns with the sampling interval $k$.

📐 Worked Example

A researcher wants to estimate the average weight of apples produced in a 100-acre orchard. The orchard is divided into 100 1-acre plots, each of which has a mix of all apple varieties grown on the farm. The researcher does not have time to visit every plot, so they randomly select 10 plots and weigh all apples on those 10 plots. Is this stratified or cluster sampling? Justify your answer.

Check how groups are constructed: each 1-acre plot has a mix of all apple varieties, so each plot is representative of the whole orchard (diverse within plots, similar across plots).
Check how sampling is done: the researcher only selects 10 of the 100 plots, and does not sample from the other 90 plots. No sampling occurs in the majority of groups.
Match to definitions: this matches cluster sampling, where you sample all units from selected representative clusters. If this were stratified sampling, strata would be the apple varieties, and you would sample from every variety to ensure representation.
The method is chosen for logistical convenience, which aligns with the core purpose of cluster sampling.

5. AP-Style Concept Check ★★★☆☆ ⏱ 3 min

Common Pitfalls

Why: Both divide the population into groups, so students mix up names and purposes

Why: Students assume "big sample = good" so any error goes away

Why: Students memorize the name of the bias but forget that AP requires context for full credit

Why: Students think any random selection of participants counts as SRS, but in voluntary response, participants select themselves, so not every sample has an equal chance of being selected

Why: Students think any error in sampling is bias

Why: Students remember the caveat about periodic patterns and assume it is never valid

Quick Reference Cheatsheet

← Back to topic

Stuck on a specific question?
Snap a photo or paste your problem — Ollie (our AI tutor) walks through it step-by-step with diagrams.
Try Ollie free →

Random Sampling and Data Collection — AP Statistics

1. Core Concepts of Random Sampling ★★☆☆☆ ⏱ 3 min

2. Common Probability Sampling Methods ★★★☆☆ ⏱ 4 min

3. Non-Probability Sampling and Common Biases ★★★☆☆ ⏱ 3 min

4. Key Comparisons of Sampling Methods ★★★★☆ ⏱ 3 min

Stratified Random Sampling

Cluster Sampling

5. AP-Style Concept Check ★★★☆☆ ⏱ 3 min

Common Pitfalls

Quick Reference Cheatsheet

More study guides