The Best Movies for Teaching U.S. History

Introduction — by the Teacher

This year, students turned their attention to movie scripts and created a text-analysis scoring system to identify movies with the greatest instructional value for teaching aspects of United States history. The scoring system is based on topic modeling, where online analysis tools were used to determine the extent to which each movie script's vocabulary aligned with the movie's overall topic and the extent to which the vocabulary was subject-specific, meaning the terms used and their frequency reinforced vocabulary commonly associated with the historical topic. Based on the analysis, several movies scored high, among them Thirteen Days about the Cuban Missile Crisis. This was the result of the script including a rich vocabulary of repeating terms such as quarantine, Soviets, nuclear, cabinet, and EXCOMM, as well as the frequent naming of historically significant individuals, like John Kennedy, Bobby Kennedy, Robert McNamara, Adlai Stevenson, and Nikita Khruschev.

Limitations of the Scorecard

While the tools we used and the formula we devised produced scores indicative of a movie's alignment with historical content, we recognize that some movies can have significant value even if they earned low scores. For example, when watching the movie Green Book, we noticed numerous scenes that concerned segregation and discrimination in the South. However, according to our formula, the movie failed to illustrate these concepts and earned a very low score. This, we know, is the result of the script lacking historical terms. Perhaps most illustrative of this is a poignant scene in which protagonist Dr. Don Shirley, a rich African-American musician touring in the Civil-Rights-Era South, steps out of his chauffeured car and observes African-American sharecroppers working in the fields. In this scene, there are no words but there is significant historical value in Shirley observing the sharecroppers, contrasting his life to theirs. This scene, along with many others, evaded our scorecard and are not represented in the final score. — This section written by Ari, Jordan, Ryan, Malik, and Summer, with help from Jonah 

Our Tools

We used free, online tools that analyzed each movie script for word frequency, word variations, length, and overall similarity to a control document of US history content and terminology. JSTOR was our first text analyzer used. When the script of each movie was uploaded, JSTOR analyzed the text and provided us with topics associated with the script. For the movie Thirteen Days, for example, JSTOR identified Cold War topics (pictured left) such as arms control agreements, nuclear warfare, and mutually assured destruction. We found that JSTOR gave us an easy way to compare each movie's stated focus and what its script actually seemed to focus on. In other words, we could see if language in the script supported what each movie was supposed to be about. There is no public documentation on how JSTOR functions, but based on our testing it proved to be an accurate analysis tool for identifying the primary content of any document. On its text analyzer's About page, JSTOR says this: "The tool analyzes the text within the document to find key topics and terms used, and then uses the ones it deems most important — the "prioritized terms" — to find similar content in JSTOR."

The second tool we used was Databasic.io's Same/Diff (pictured right and highlighted to show our word counting), which allowed us to do a word-count cross comparison of movie scripts with the College Board's Advanced Placement U.S. History Key Concepts to see what terms matched. Databasic.io produced a word cloud of terms appearing in each movie script and in the Key Concepts. The size of the matching terms within the word cloud decreased as the frequency of word matches decreased.

Voyant, which was the third tool used, displayed the most common words within each script along with the total number of times each word appeared. This feature made Voyant very useful for quantifying each term. According the website's About page, "Voyant Tools is a web-based text reading and analysis environment ... designed to facilitate reading and interpretive practices for digital humanities students and scholars as well as for the general public." — This section written by Jordan, Malik, Ari, Ryan, and Summer, with help from Jonah, Grace, and Emma

About the Procedure

The spreadsheet, above, and the next section of this page shows the weighting that we applied to each tool's individual scoring. The weighting was not used to increase the value of any tool, but to equalize them.

Here is the process we used for gathering the data on each movie:

  1. Run the full, uncleansed script through JSTOR.
  2. Count the number of items relevant to the topic. More specifically, for JSTOR count all the topics, people, and locations with historical relevance to the movie's primary theme, which we determined using ChatGPT and IMDB. Be strict in the counting, only counting historical terms clearly relevant to the topic; if there's any doubt, raise the question to the class and record that number into the spreadsheet.
  3. Run a cleansed version of the script (stop words removed, non-historical names removed) through Databasic.io's Same/Diff Tool. Only relevant historical terms should be counted. And the following weighting should be applied to each term in order to generate DataBasic's raw score: top line (x3), second line (x2.5), third line (x2), fourth line (x1.5), and all remaining lines (x1). Record the total into the spreadsheet.
  4. Run cleansed script through Voyant Tools. Then, move the slide bar on the bottom left all the way to the right, revealing the maximum amount of frequently used words. With the words that appear, find the relevant words and their frequency in the script. Next, divide the sum by the total minutes of the duration of the movie. This is the raw score for Voyant that needs to be added to the spreadsheet. — This section written by Jordan, Malik, and Ryan

About the Formula

Our formula was the following: 2.5J + D + 20V = P, where J is the raw score from JSTOR, D is the raw score from DataBasic, V is the raw score from Voyant, and P is the raw final score. The scores we procured from JSTOR were scaled by a factor of 2.5 times, the raw scores of DataBasic.io were unscaled, and raw scores from Voyant were scaled by a factor of 20. This added up to the raw final score, which was then put into the equation S = 100P/(100+P), where S is the final score and P is the final raw score. See below for more about the adjustment leading to S. — This section written by Ryan

About the Adjustment 

Originally, scoring on a flat basis led to wildly uneven scores that ranged from near-zero to over 200. We realized that this was impractical and rather useless because comparing numbers with huge differences makes the evaluations extreme. In order to achieve this flattening curve, we implemented a rational function that limits the max score to 100, becoming increasingly flattening as the scores become higher. This allows us to get scores that are in the same approximate range while still keeping them in the same relative order of importance. The image above demonstrates this. The blue line represents the unadjusted score and the black line represents the adjusted score. To convert from the unadjusted score to the adjusted score, one must plug the unadjusted score, P, into the equation for the adjusted score,

adjusted score is always less than 100. To access the graph that can calculate the adjusted score based off the raw JSTOR, DataBasic.io, and Voyant scores, click here. — This section written by Jordan

About Us

"The Best Movies for Teaching U.S. History" is the product of the 2023-2024 AP U.S. History class: Grace Cox, Ari Garceau, Summer Gosselin, Ryan Mauzy, Malik Nasir, Jonah O'Brian, Emma Reed, and Jordan Roosevelt.