Skip to main content

On the morning of the 2016 general election, many news sources predicted a strong victory for Hillary Clinton. FiveThirtyEight gave the Democratic nominee a 70% chance of winning the election and confidently predicted wins in crucial swing states. Other outlets predicted an even larger margin of victory, with the New York Times giving Clinton a whopping 85% chance of assuming the presidency.

However, the polls overestimated Clinton support, and Trump even picked up swing states thought out of play like Wisconsin and Michigan on his way to the upset win. Some analysists pointed to structural challenges like low response rates and rapidly changing demographics to explain why the predictions were off. Other analysts sought to improve the estimation models by incorporating more demographic data.

Currently, public opinion surveys are weighted to make sure the respondents in the samples reflect make-up of the general population. In essence, weighting involves taking easily observable demographic data and then assigning a weight to each respondent. The actual formula for creating weights is a trade secret, but we set out to see how the formula and process can be improved.

One of the biggest weaknesses of the weighting scheme was that it failed to incorporate education levels of respondents into the formula. My research project asked whether including data into the weighting models would improve the accuracy of the results and by how much. This election featured significant political cleavages around education; voters with lower education, especially those in manufacturing, tended to vote for Trump. By omitting education in the weighting formula, voters with higher education levels were overrepresented in the sample leading to incorrect predictions. Although the research is still underway, it appears that by incorporating demographic data for educational levels the accuracy of our predictions improved by as much as 3 percentage points in some states.

On the morning of the 2016 general election, many news sources predicted a strong victory for Hillary Clinton. FiveThirtyEight gave the Democratic nominee a 70% chance of winning the election and confidently predicted wins in crucial swing states. Other outlets predicted an even larger margin of victory, with the New York Times giving Clinton a whopping 85% chance of assuming the presidency.

However, the polls overestimated Clinton support, and Trump even picked up swing states thought out of play like Wisconsin and Michigan on his way to the upset win. Some analysists pointed to structural challenges like low response rates and rapidly changing demographics to explain why the predictions were off. Other analysts sought to improve the estimation models by incorporating more demographic data.

Currently, public opinion surveys are weighted to make sure the respondents in the samples reflect make-up of the general population. In essence, weighting involves taking easily observable demographic data and then assigning a weight to each respondent. The actual formula for creating weights is a trade secret, but we set out to see how the formula and process can be improved.

One of the biggest weaknesses of the weighting scheme was that it failed to incorporate education levels of respondents into the formula. My research project asked whether including data into the weighting models would improve the accuracy of the results and by how much. This election featured significant political cleavages around education; voters with lower education, especially those in manufacturing, tended to vote for Trump. By omitting education in the weighting formula, voters with higher education levels were overrepresented in the sample leading to incorrect predictions. Although the research is still underway, it appears that by incorporating demographic data for educational levels the accuracy of our predictions improved by as much as 3 percentage points in some states.