Exploratory Analysis
The exploratory results showed the top characters, locations and scenes by simply counting the number of utterances (quotes). The unigram analysis matched the exploratory results, with the main characters being Riley and Joyce. In addition, the analysis highlights memory as an important theme in the movie.
Because the character names are emotions (Joy, Sadness, Fear, Anger and Disgust), they were replaced by common names (Joyce, Felipe, Angelo and Diane) for the text analysis to avoid issues with the parsing and name entity recognition, and biases with the sentiment analysis (e.g. Sadness mentions Joy most of the time).
The unigram word cloud from quotes and descriptions of the script provides basic messages about characters, locations and events. For example, Riley, Joyce are the two main characters and followed by Sadness and Bing Bong.
The distribution of quotes by characters and time index (quote index) is shown next. Quotes of the five emotions are across the whole movie. Joy, Sadness and Fear are the top 3 characters with the most quotes , so their quotes frequency distributions are denser than others. One highlight is that Bing Bong appeared in a certain time range shown by a yellow region in the figure. Another highlight is that Mom’s and Dad’s emotions first time showed up together in the conversation on the San Francisco dining table.
The next figure indicates a distribution of quotes by location and time index (quote index). Headquarter, Long-term Memory and San Francisco House are the top 3 locations with the most quotes, so their quotes frequency distributions are denser than others. 3 areas, which are circled out, indicated interesting insights.