Introduction (August 20): This week a new CNN/SSRS poll was released that showed the national race between Biden and Trump much closer than other polling (see today’s Weekly Polling Review post for more information). It also sampled a cross-section of 15 “battleground states.” Sound familiar? It’s what they did in late May too. This time they oversampled the battleground states, but the same problem from the May remains: there is no jurisdiction of “battleground states.” In order to understand how any presidential candidate is doing in the Electoral College, states have to be individually surveyed, not collectively. My May 22 post explains it, so I am republishing it below.
Original Post (May 22): A CNN poll this week made a lot of nervous Democrats more nervous. While Biden led the national results, he was losing in battleground states. It looked like the beginning of a repeat of 2016 – the Democrat winning the national popular vote, but losing where it counts: in the Electoral College. This is what so many nervous Democrats are so nervous about. And it won’t help that today Individual-1 was saying good things about that poll. But, the poll is not particularly useful and if there are any real political pros on the Trump campaign staff they will see it. The problem is that CNN sampled battleground states as if they were one big jurisdiction. But in reality – and in a way that matters for a presidential election – they are not one jurisdiction.
Polling 15 states in the aggregate, as CNN has done, is a cost saving measure. Polling is expensive. For any pollster, there is a rough fixed cost to doing one poll that gets a representative sample of at least 400 respondents (a lower number gives you too high of an MOE for any real usefulness). If there are five battleground states that you want to understand, you can do five polls or one poll. One poll is cheaper than five – and much cheaper than fifteen. So, why not go the cost-effective route? Especially if you do not have the budget to poll all the states separately. The answer is simple: in the presidential election the vote in each state is actually a separate election to pick that state’s Electors who will then vote in the Electoral College. The outcome in each state matters. The outcome across any 15 states does not.
What compounds the problem is the high margins of error we see as sub-samples are cut from larger samples, as was done in the CNN poll. Let’s change subjects for a minute and consider the unemployment rate for cities and towns. Most people are aware that the national unemployment rate is released at the beginning of each month. Less well known is that later in the month unemployment rates are released for each state and for cities and towns in the states. It’s important to understand that the unemployment rate is not a count of the number of people collecting unemployment insurance or an estimate of the rate of people collecting UI (that is something called the “insured unemployment rate” and it’s mostly used for technical purposes – there is no reason you should have ever heard of it). The unemployment rate is an estimate of the number of people who are out of work and are looking for work. (In fact, there are five different unemployment rates. The one that gets all the attention and the one we are talking about is the U-3 rate.)
Estimates of the U-3, just like estimates of a politician’s support in polls, are derived from surveys of a sample of the population. If done well with proper controls and weights to approximate the representativeness of the sample, an estimate can tell us what the true number is in the overall population most of the time (say 90-95% of the time) within some margin of error. The MOE is a function of the size of the sample. The larger the sample size, the smaller the MOE – although there are diminishing returns after a while. Generally-speaking, 400 or more respondents is a good enough sample for estimating. The reverse, however, is what we are interested in here. The smaller the sample size, the larger the MOE. And once you get an MOE over six percent the estimate is all but useless. For example, if it was reported that the U-3 was 4% that would suggest that the economy was doing relatively well, at least for wage earners looking for work. But, if the MOE for that poll is 10% then it’s possible the real rate is over 14%, which would suggest an economy seriously in recession. It’s not a helpful to have an estimate with such a high MOE. And it would be same for a poll. Polls should give us a rough idea of popular opinion at the time rather than be no better than “your guess is as good as mine.”
The Current Population Survey (CPS) is conducted monthly by the Census for the BLS to calculate the U-3 and other unemployment rates. The state rates are derived from sub-samples and the city and town rates from further sub-samples. Each time, the MOE gets higher. That’s because there just aren’t enough respondents that live in smaller jurisdictions – and those respondents are likely to not be representative of the smaller jurisdictions. In fact, the MOE is so high in even the state samples that the BLS will not apply seasonal adjustments to those rates. Some states have a large enough population that the data is better (e.g., California), but even then such states will only be able to provide better unemployment data if they have robust state labor market information offices doing research to supplement and complement the BLS.
Confused? It is confusing, which is why the media has such a hard time reporting on polls and even experienced politicos have a hard time separating the wheat from the chaff. So, let’s just say in any population if you want to get a reliable estimate you need a representative sample of at least 400 people. The nice thing is that it doesn’t matter how big the population is, 400 will work (MOE = 4.9%). Less than 400 starts getting you into trouble with your MOE, but probably not until you get lower than 385 – then the estimate will be too unreliable. (IMO, you want at least 600 respondents because that gets you to 4% MOE.) But, you also need to have the correct population to derive that 400. If you want to know how Vermonters plan to vote in an election, you need to survey Vermonters. You cannot take a sample of 400 people from all six New England states and reliably conclude anything about what people in Vermont think. But, this is what CNN has done with its battleground state numbers.
The CNN poll was done by SSRS, which has a B/C rating from 538. That’s not great, but not far off from most of the pollsters out there. SRSS surveyed over 1,100 Americans for the national results. For the battleground states it used a sub-sample of 583 respondents. That gives the sub-sample estimate a very respectable MOE of 3.7% (I believe they are using a lower confidence interval, because at 95% reliability the MOE should be a hair over 4% for 583 respondents – although perhaps it’s related to some adjustment protocols). Can you see the problem yet? We don’t know where in those 15 states the respondents came from, but if evenly distributed that would mean just under 39 respondents for each state. For a sample size that small, regardless of population, the MOE would be about 16% (in statistics, this is the same as saying the estimate is just plain no good).
To be fair, CNN is not saying that voters in North Carolina are planning to vote a certain way; they are clear that the number concerns voters across 15 battleground states. The problem here is that there is no “battleground states” jurisdiction in the presidential election. To think of it this another way: let’s say there are five states that make up “battleground states” and all of them are having Senate elections. In one state, the Democratic candidate is way ahead of the Republican, in another the Democratic candidate is starting to break away (so not close, but not big either), and in the other three state the GOP candidates are winning close races. Individual state polling would show that Democrats look like they are going to win two states and the other three are leaning GOP. Surveying them collectively might show that the Democrats are winning by five points – suggesting to the average person that the Democrats are winning in all five states, when in fact they are only winning in two. You might say, but those are five separate elections; it's not the same thing. But, it is. Just like the results in each individual state matter for who wins the Senate race there, the results in each state matter for who wins the state's Electoral Votes for president. So, polling across a group of states doesn't tell us much, if anything.
We do have state-level polling from other pollsters. And what it suggests, according to several analysts, is that Biden’s lead is probably two points larger than what the national polling shows. State-level polling in recent weeks has shown leads for Biden in Arizona, North Carolina, Florida, Colorado, Michigan, Pennsylvania, and even Wisconsin. It has also shown ties or close races in Texas and Georgia. Even Ohio has polled close once or twice. CNN is not the first pollster to present data from a “battleground states” or “swing states” sub-sample. Recently, a Democratic pollster did that and made some conclusions that conflicted with CNN’s. Be a discerning reader and don’t let such results cause you any despair or excitement. It’s not that any pollster is fabricating anything by doing sub-samples that aggregate data across states; it’s that in the context of a US presidential election, this just doesn’t tell us anything useful. Pay attention to polls that use the correct unit of analysis (i.e., the state) and survey battleground states individually. Some of them are not great pollsters, but others are. Either way, you’ll get a better understanding of the race than if you worry about estimates derived from aggregate state samples.
https://www.cnn.com/2020/05/13/politics/cnn-poll-2020/index.html
https://www.cnn.com/2020/05/17/politics/state-polls-2020-analysis/index.html