Keeping It Reel

Data visualization web application to explore conversation themes and intra- and inter- sex ratios in movie dialogues across decades and genres


 Visual & Interaction Design + Data Analysis + Development

Adobe XD, Javascript, D3.js, Python, LIWC, TextBlob, Genderizer

September 2018 - December 2018 (CS 7450 Information Visualization, Georgia Tech)

Team composition:
4 person team



Keeping It Reel is a data visualization web app which allows users to explore and analyze inter/intra sex ratios and conversational themes in movie dialogues from 600+ movies spanning 9 decades and various genres. The goal of the visualization is to allow movie linguists/analysts and movie buffs alike to understand what characters in their favorite movies are talking about. We derived further attributes in the dataset through a range of data analysis methods, brainstormed and iterated different ways to visualize the information, designed the visuals and interactions for the final visualization and implemented it using D3.js.


This project was highly appreciated by the faculty and chosen by the professor as one of the best projects in the class for the semester. A lot of people appreciated the visualization and had a lot of fun playing around with it and exploring their favorite movies.

Demo Video


Understanding the Data

The Dataset

For our dataset we used Cornell Movie Dialogs Corpus, released along with the paper Chameleons in Imagined Conversations: A new Approach to Understanding Coordination of Linguistic Style in Dialogs  by Cristian Danescu-Niculescu-Mizil and Lillian Lee. It contains multiple files which provide a metadata-rich collection of movie conversations extracted from raw movie scripts. It covers 220,579 conversational exchanges between 10,292 pairs of movie characters. It involves 9,035 characters from 617 movies. There are in total 304,713 utterances.
Note: Raw publicly available scripts were matched with the IMDB database, those with less than 5 IMDB votes were discarded, pairs of characters that interacted and had more than 5 conversational exchanges were identified to compile the final list of conversations.

Structure of the Data
The different data files and their attributes (including the ones added by us)

Target Audience

Movie critics and linguists who are interested in understanding what movies are talking about and how different trends are changing for time and across genres - eg: Netflix analysts.

Movie buffs or anyone who likes movies and is interested in knowing more about what the characters in their favorite movie typically talk about.

User Goals

Explore dialogue content
How do conversation themes and genres intersect? In a specific movie, how do themes intersect with characters? What are the patterns of conversations between characters?

Gain insight into intra and inter sex ratio & balances
How does inter & intra sex ratio in conversations vary over years and themes? Do certain genres have a better female-female balance in conversations? How do these ratios intersect with movies that pass or fail the Bechdel test?>

Data Analysis
We did a lot of cleaning, data preprocessing, natural language processing (NLP) & analysis to help facilitate different ways to achieve our two main user goals.
Disclaimer - Unfortunately, the Genderizer API and our dataset in general did not include any information for non-binary genders.

Icons made by mynamepong from is licensed by CC 3.0 BY
Icons made by Freepik from is licensed by CC 3.0 BY


Brainstorming & Ideation

We discussed the different analytical questions that could be asked and the different approaches we could take to fulfill our user goals. Based on that, we came up with 2 multi-coordinate (composite views consisting of different parts which interact with one another) views and 2 innovative (a single view with multiple elements) views.

Design Alternatives
Idea #1

This design idea uses a movie reel visual to highlight details about each specific movie. The reel itself would be a density pixel of every movie, colored by year or genre or filtered to the decade by clicking on the top bar. If you select a movie, in the film strip, you would get to see details on that movie such as the genre, year, if it passed the Bechdel test, the overall sentiment of the movie and the ‘theme/categories’ of the movie.

Idea #2

This idea gives the users the option of drilling down by genre or by decade in a fun, whimsical way - like a calendar or movie board. Once they select any of these categories, we would then have maybe a dense pixel mapping of movie titles filtered to that genre/decade with the overall sentiment and sex ratio balance trends for that specific genre/decade in the side panel.

In the detail view, we have the conversation mapping between different characters in the movie, in the form of an arc diagram, with each node representing a character in the film. Hovering over an arc would or clicking on it would give us more information about the conversation in the form of a popup. Here, we also have sentiment and sex ratio balance trends for the specific movie to compare to the overall trends.

Idea #3

In this design, the user could select a movie and they would see details about it like the sentiment, sex balance ratios, IMDB ratings (dials in the top left quadrant), bar charts to show words/themes by sex (bottom left quadrant), arc diagram to show conversation mapping with each character as a node (bottom right quadrant) and a density pixel of top words used (top right quadrant).

Idea #4

This was a highly analytical visualization, which allowed users to see cross sex, word, sentiment and ratings trends through treemaps, bar charts and stacked area charts. The user could also drill down from high-level trends across genre and time, to details about individual movies. The rest of the viz mostly focused on the top words in different categories - positive/negative sentiment, same-sex/cross-sex conversations, context and so on.

Fig 1: Primary View: User starts with Treemap in top right corner, showing types of conversations and sentiment for different genres.
Clicking a segment of the treemap in Fig. 1 opens up a dense pixel plot focused on that genre, and the user can hone in to a specific movie on clicking a pixel. The user could also select a subset of the years in Fig. 1 to focus on a specific era, which would update the treemap and bar charts
User & Expert Feedback

We had a poster session where we collected feedback from our classmates as well as experts on various aspects such as what they are interested in seeing with the data (to evaluate which direction we should go in), which visualization appealed to them the most (in terms of visualization and functionality) and usability.

Icons made by dDara from is licensed by CC 3.0 BY
Iteration #1

Based on overall feedback, we went with an overview and detail view as people were interested in discovering high level patterns and then drilling down to a specific movie.
Overview view has overall trends through a dense pixel mapping of all the movies across sentiment, bucketed by decade, and shaded by whether they pass the Bechdel test or not. It also has along with a bar graph of top themes across decades.

Clickable prototype

The detail view, once a user chooses a movie, has a network diagram of conversations through a hybrid between a parallel coordinate and an arc diagram. This has two axes for male and female characters, with arcs between nodes on the same axes representing same sex conversations and lines between the two representing cross sex ones. The view also has sliders for sentiment and bars to indicate sex balance for that movie vs the decade it was released in, as well as top themes shown through bubbles.

Clickable prototype
Changes to Iteration #1

Based on further data analysis, we found no interesting pattern for the sentiment, and found that the sex balances in the conversations over the decades were more interesting. Further, themes were unchanging across decades - so we did not see the value in showing that, rather we showed the conversational themes obtained from LIWC across genres. For the detail view, as we implemented it we made some design changes - the theme bubbles made it difficult to distinguish the value for each theme so we opted for bars instead, and placed the stacked bar charts next to one another for easy comparison.

Final Design

Visual Design & Interface Elements

We wanted a fun, modern, sans serif font as our visualization was built to be fun & engaging for the users. We went with Komu.

Color Palette
Interaction Design

Selection of decade highlights those movies in the scatter plot and modifies the themes-genres heat-map to reflect the values for the movies in that selected decade

Selection of parameter to color by - the movies in the scatter plot could be colored by different measures, like Bechdel test, or various conversational themes.

Brush and link - Since the scatter plot showed all the movies distributed by percentage of male-male, male-female, female-female conversations, we wanted users to explore how a movie fared across all the three. Hence, brush and link allows the users to select a set of movies with say, high male-male conversations to see how they fare in the female-female conversations.

Selection of genre - Selecting a genre filters to the movies in those genres in the scatterplot.

Select movie/use dropdown - Selecting a movie allows the user to go into the detail view. We had to provide the user an option of a dropdown in addition to selecting from the scatter plot, we took that design decision as navigating through multiple movies would be difficult on the scatter plot. Selecting a movie selected it across all the three scatter plots.

View conversation on hover - Hovering over the arcs highlights it and shows the conversation it represents.

Theme filter - Selecting any of the themes filters the arcs to show only those conversations.

Interesting Insights

I'd love for you to explore the visualization and discover insights on your own, but here are few of the interesting ones we found (not giving them all away!)

  • The female-female ratio of conversations in movie is far lower than what we thought, and hasn't gotten that much better with time.
  • Horror movies have the highest female-female conversations, action, adventure and sci-fi have the lowest.
  • The crime genre reported the highest figures for swearing, anger and sexual themes.


  • In order to achieve what you want, some amount of data preprocessing and analysis is required. I got from this project not only design & development experience, but also learnt how to understand data and analyze it.
  • For an analytical visualization, think about questions that the user group will be interested in answering. There were many ways we could have gone about for this but what was important was what the user wanted to see.
  • It is important for the flow between design and development to be cohesive. This project worked because design and development did not happen in a vacuum, rather we all worked together and did both. I not only had to play the role of designer and communicate the designs, but also step in to implement the visualization.

Tags Information Visualization, Visual Design, Interaction Design, Development