When a news article reports that a poll of 1,000 people predicts how 250 million people will vote, the obvious reaction is skepticism. How can a sample one-quarter of a million times smaller than the population possibly tell us anything reliable? The answer is one of the most important and least intuitive results in statistics: with proper random sampling, sample size matters far more than population size. A well-drawn sample of 1,000 can produce more accurate estimates than a poorly drawn sample of a million.
The Literary Digest Disaster
The principle was learned the hard way. In 1936, the Literary Digest, a respected American magazine, conducted what was then the largest political poll in history: 2.4 million mailed responses predicting the outcome of the Roosevelt-Landon presidential election. The Digest confidently projected a 57-43 victory for Landon. Roosevelt won 62-38 — one of the largest landslides in American history.
The Digest's mistake was not sample size; with 2.4 million responses, no amount of additional data would have helped. The mistake was sample selection. The Digest had drawn its mailing list from telephone directories, automobile registrations, and its own subscriber base. In 1936, in the middle of the Great Depression, all three sources skewed sharply toward wealthier Americans, who were also overwhelmingly Republican. The poll did not measure the electorate; it measured a slice of the electorate that happened to favor Landon.
What George Gallup Got Right
That same election, George Gallup conducted a much smaller poll — about 50,000 carefully selected respondents — and correctly predicted Roosevelt's victory. Gallup's innovation was not better statistics. It was quota sampling: deliberately selecting respondents so that the sample's demographic composition matched that of the actual electorate. If 12% of voters were Black, then 12% of his sample would be Black. The technique was crude by modern standards, but it captured the essential insight: representativeness matters more than size.
The Theory of Random Sampling
Modern survey methodology is built on the foundation of simple random sampling: every individual in the population has an equal probability of being selected, and the selection of one individual does not affect the probability of any other. Under those conditions, the laws of probability guarantee that the sample mean approaches the population mean as the sample size grows, regardless of how large the population is.
This is the result that lets a 1,000-person poll work. The margin of error of a simple random sample depends almost entirely on the sample size, not on the population size — as long as the population is large relative to the sample. For a population of 250 million, the margin of error of a 1,000-person sample is essentially the same as for a 1,000-person sample drawn from a population of 50 million.
Why It Is Hard in Practice
True simple random sampling assumes that you have a list of every member of the population and can contact any of them with equal probability. In practice, this is almost never true. Some people do not have phones. Some refuse to answer pollsters. Some respond differently in person than over the phone. Each of these creates a systematic deviation from randomness — and systematic deviations are exactly what destroyed the Literary Digest.
Modern survey research deals with these non-random factors through stratification, weighting, and post-stratification adjustment. If young voters are underrepresented in the raw sample, their responses are weighted more heavily. If urban respondents are overrepresented, their responses are weighted less. The resulting adjusted estimate is no longer a pure random sample, but it is the closest approximation possible given the realities of who actually answers the phone.
The Rise of Non-Probability Internet Panels
In the last two decades, a new method has come to dominate polling: opt-in internet panels. Instead of randomly dialing phone numbers and hoping for response, polling firms maintain large databases of pre-recruited respondents who have agreed to take occasional surveys. When a new poll is needed, the firm draws from the database using quota and weighting techniques to approximate a representative sample.
This approach is fast and cheap, but it sacrifices the theoretical guarantees of random sampling. There is no probability calculation that says a self-selected panel will produce unbiased results. Researchers argue, with some justification, that careful weighting can compensate for self-selection. Critics argue, also with justification, that this is a leap of faith that has not always paid off — notably in the 2016 US presidential election, where many internet-based polls missed the result.
Random Sampling in Smaller Settings
The same principles that govern political polling apply to smaller-scale random selections. A teacher who randomly selects students for a project is, in effect, drawing a small sample from the classroom. A manager who randomly assigns employees to a pilot program is drawing a sample from the workforce. The mathematics is identical: the selection mechanism must be genuinely random and independent for the results to be unbiased.
Practical tools like spinner wheels, randomized name lists, or shuffled card piles can each provide adequate random sampling for small groups, provided two conditions are met: the underlying randomness is high-quality, and the selection process is not influenced by the experimenter or participants. A wheel that always lands on the segments closest to the pointer, or a teacher who unconsciously selects names from the visible front row, defeats the purpose.
The Enduring Lesson
The history of survey research is a history of learning, over and over, that sample size is not the same as sample quality. A poll of two million people can be useless. A poll of a thousand can be excellent. The difference lies entirely in how the sample was drawn — whether the underlying mechanism gave every member of the population a real, equal chance of being heard. Every time a researcher, journalist, or decision-maker asks "what does this number actually represent?", they are repeating the lesson the Literary Digest learned in 1936.
Recommended Reading
If you found this article useful, these books go deeper into the same topics. Each title is hand-picked for the material covered above.
- Naked Statistics: Stripping the Dread from the Data by Charles Wheelan — A friendly introduction to sampling, inference, and the pitfalls covered in this article. View on Amazon
- Statistics, 4th Edition by David Freedman, Robert Pisani, Roger Purves — The standard undergraduate textbook on statistics, with rigorous treatment of sampling design. View on Amazon
As an Amazon Associate, Roulety earns from qualifying purchases. This does not change the price you pay and helps support the writing on this site.
© 2026 Roulety. Free online spinner wheel for decisions, games, and fun.