Statistics · Collecting Data (12-15% of AP exam) · 14 min read · Updated 2026-05-11
Random Sampling and Data Collection — AP Statistics
AP Statistics · Collecting Data (12-15% of AP exam) · 14 min read
1. Core Concepts of Random Sampling★★☆☆☆⏱ 3 min
Random sampling is the process of selecting a subset of observational units from a larger defined population of interest to collect data, with the core goal of producing a representative subset that allows valid statistical inferences about the whole population.
By contrast, a census collects data from every unit in the population, which is rarely feasible for large populations due to cost, time constraints, or destructive testing (e.g., testing battery life destroys the product). This topic makes up roughly half of the Collecting Data unit, which accounts for 12-15% of your total AP exam score.
N = \text{Total population size}, \quad n = \text{Sample size}
2. Common Probability Sampling Methods★★★☆☆⏱ 4 min
The four most common probability sampling methods tested on AP Statistics are:
**Simple Random Sampling (SRS)**: Every possible sample of size $n$ has an equal chance of being selected, typically implemented with a random number generator or table.
**Stratified Random Sampling**: Population divided into non-overlapping *strata* (groups of units similar on a response-related variable); an SRS is taken from each stratum to reduce sampling error and guarantee subgroup representation.
**Cluster Sampling**: Population divided into non-overlapping *clusters* (each representative of the whole population); some clusters are randomly selected, and all units in selected clusters are sampled for logistical efficiency.
**Systematic Random Sampling**: Every $k$th unit is selected from a population list, after a random starting point between 1 and $k$; simpler than SRS when a sequential list exists.
Exam tip: If an AP question asks which method is most appropriate, always match the method to the stated goal: if the goal is to ensure subgroup representation, it is stratified; if the goal is cost/logistical efficiency with representative groups, it is cluster. Don't confuse the two.
3. Non-Probability Sampling and Common Biases★★★☆☆⏱ 3 min
Non-probability sampling methods do not assign known non-zero selection probabilities to all population units, so they almost always produce biased results. The most common non-probability method is convenience sampling, which selects easily accessible units.
**Selection (Undercoverage) Bias**: Some population groups are systematically excluded from the sampling frame (list of units available for selection), so they have no chance of being selected.
**Nonresponse Bias**: Selected units refuse to participate or cannot be contacted, and nonrespondents differ systematically from respondents on the variable of interest.
**Response Bias**: Participants give inaccurate responses, usually due to social desirability, leading question wording, or recall error.
Exam tip: AP FRQs require you to explain bias in context, not just name it. Always add one sentence explaining whether the sample estimate will be too high or too low relative to the true population value to earn full credit.
4. Key Comparisons of Sampling Methods★★★★☆⏱ 3 min
AP Statistics frequently asks students to distinguish between similar sampling methods, most commonly stratified vs. cluster sampling, which are often confused because both divide the population into non-overlapping groups.
METHODS COMPARED
The core difference between the two methods is summarized below:
Stratified Random Sampling
Groups (strata) are constructed so units *within a stratum are similar* on the variable of interest. You sample from every stratum.
Cluster Sampling
Groups (clusters) are constructed so each cluster is representative of the whole population (heterogeneous within clusters). You only sample from randomly selected clusters.
Another common comparison is SRS vs. systematic sampling: systematic sampling is simpler to implement, but will produce bias if there is a repeating periodic pattern in the population list that aligns with the sampling interval $k$.
5. AP-Style Concept Check★★★☆☆⏱ 3 min
Common Pitfalls
Why: Both divide the population into groups, so students mix up names and purposes
Why: Students assume "big sample = good" so any error goes away
Why: Students memorize the name of the bias but forget that AP requires context for full credit
Why: Students think any random selection of participants counts as SRS, but in voluntary response, participants select themselves, so not every sample has an equal chance of being selected
Why: Students think any error in sampling is bias
Why: Students remember the caveat about periodic patterns and assume it is never valid