Building an Election Forecasting Model for the NH Primary

Introduction — by the Teacher

This year's class initially chose to investigate the extent to which public opinion about the nation's problems had been reflected in newspaper coverage over time. Their findings, which revealed some correlation between polls and news, hatched secondary research questions that helped shape the final 2023 digital project: If news and opinion polls were sometimes closely correlated, could news coverage possibly anticipate or predict polling? And if so, could news coverage play a role in political forecasting? After running multiple newspaper searches for monthly time periods preceding polls for the 2016 Republican nomination, it was concluded that news coverage showed some potential for divining a presidential candidate's future. Ultimately, other measures were examined, including polls, endorsements, Google Trends, and biographical characteristics, all of which were incorperated into a statistically robust election forecasting model. While the model seems to have excellent potential for predicting the outcome of general election or full nomination races, students focused their forecasting specifically on the New Hampshire primary. After initially back-testing the model for the Republican race in 2016, where it pointed to a Donald Trump victory as early as July 2015, subsequent back-testing for 2020 revealed limitations with the design—at least for New Hampshire. As they tried to find ways for the model to also work for the Democratic race in 2020, students concluded that more data points for New Hampshire, extending later into the summer and fall, were necessary. In the space below, students explain more fully about the entire design process.

How We Started: Finding a Newspaper-Opinion Poll Alignment

Initially, for the 2023 digital history project, we decided to investigate and analyze the correlation between newspaper coverage and public opinion in presidential election years, going back to 1948. This was inspired by questions surrounding the growing influence of social media and online news sources, in contrast to traditional newspapers and their effect on public opinion. More specifically, we sought to determine if any correlation between polls and newspaper coverage declined in the age of Twitter and Facebook. First we used polling data compiled from multiple sources, such as Gallup, in order to determine the most important issues among the public during different election years. These issues, as can be seen on the spreadsheet (right), included problems such as inflation, foreign conflict and terrorism. We used this data to determine when these topics were viewed as important by the public. See the full spreadsheet. This paragraph written by Sophie and Delia

In order to determine the frequency with which newspapers covered specific topics, we used Newspapers.com’s time-specific search feature to constrain keywords to years. We then entered the number of results that matched the keywords into a spreadsheet dataset for the corresponding year, giving us a clear visualization of usage over time. From our dataset, we were able to compile which years featured the most clear crossover between newspaper coverage and public opinion, which we denoted by highlighting corresponding cells within our spreadsheet. We found that the newspapers showed a correlation with the polling data. This claim can be made with corroboration from different searches, including Vietnam in 1968 and inflation in the 1970s, but is most clear with the search of COVID 19 and coronavirus. Newspaper coverage for these terms was highest during the pandemic in 2020, but was virtually nonexistent during prior years. This paragraph written by Annaliese and David

Pivoting to Political Forecasting

From our newspaper investigations, questions arose about whether news coverage could predict political outcomes. We therefore further investigated newspaper coverage by focusing on the 2016 Republican NH Primary candidates, doing so in monthly increments preceding polls. We observed a correlation between coverage and people’s views toward a candidate, giving us some confidence that newspapers could be predictive like polls. 

While newspaper coverage appeared useful for forecasting, we moved beyond simply counting the frequency of candidate names, choosing to also work with an alternative forecasting model created by a scholar at the University of Pennsylvania. Based on biographical features of past presidents, the study helped us choose individual qualities that we believe were most important for Republican voters. From there we created a spreadsheet (below) and, through class collaboration and our knowledge of the values of Republican voters and New Hampshire residents, we added weightings to each category to accurately depict the correct values. See the full spreadsheet. These paragraphs written by Annaliese and David

After assigning weightings for the biographical information, we turned to the newspaper frequencies, polls, and Google Trends. For these data points we concluded that the most effective approach was to first equalize the data. After this we were able to easily create different weightings for the poll numbers and news and Trend counts. We came together after this once again to debate the weighting values. This first resulted in weighting the New Hampshire data more heavily, assuming local input would be more valuable, especially when considering that New Hampshire tends to be independent-minded. We later shifted to an emphasis on national data, though, once our back-testing showed more accuracy in this approach. This paragraph written by Sienna, Caroline, Emily, and Jenna

Because the more recent data, such as July polling numbers, seemed to be more predictive than February or March numbers, we decided as a class to assign greater value to more recent data and lesser value to the older data. To test our model we first used data from the 2016 election, and found the model to be predictive of the election winner. We then assigned multipliers to each category in order to make the model as accurate as possible, attempting to finalize it. 

The Model's Performance and Our Conclusions

Our model ran into trouble when we moved from 2016 to 2020 in our back-testing. While we were able to adjust the model to work for the 2016 Republican election, when working with the 2020 Democratic data the model produced unreliable conclusions. It predicted that Biden would win the New Hampshire primary when, in reality, Bernie Sanders did. Part of the model did, however, predict Sanders as runner-up in a tight race. These paragraphs written by Sophie and Delia

We discovered that the model had flaws while very near the end of the project, a theme consistent with the continuous time crunch we faced while working on the project. We determined that in order to accurately represent each political race, more data points were necessary, extending beyond July further into the election cycle. In the end, we were able to correctly predict the results for the 2016 Republican election, but were unable to correctly back-test our model on the 2020 Democratic election. Interestingly enough, though, our model—with all of its current data—was able to accurately predict the final presidential nominations for each party in both 2016 and 2020. This paragraph written by Annaliese and David

About Us

"Building an Election Forecasting Model for the NH Primary" is the product of the 2022-2023 AP U.S. History class: Sophie Bourque, Jenna Caron, David Flater, Delia Leslie, Caroline McMahon, Sienna Reagan, Annaliese Rowell, and Emily Sevene.