Elaina Boyle Midterm

For my project, I created a word cloud showing the frequency at which different words appear in Shakespeare’s Romeo and Juliet. Shakespeare uses repetition frequently for emphasis, and as a Theater and Computer Science double major, I knew I wanted to apply my computer science skills to the most celebrated script in history, to see if visualizing the frequency of repeated words in the script would allow me to see the play in a more nuanced way. The dataset I used was just the text for Romeo and Juliet. I had to do a little bit of cleaning, like creating a txt file full of stop words and trimming the text of Romeo and Juliet to only include the script and not the publishing or licensing information. Then, I wrote a program in Java to count the frequency at which words appear by sorting them into a tree. After the text has been analyzed, it gets sent to a word cloud maker, which sizes each word depending on the frequency at which at they appear and assigns them a random color from a color roster, and outputs an HTML file that contains the word cloud. From there, I just had to embed that HTML in my subdomain, and here’s the final product:

tell much cap word art again prince hath montague speak give both two day i’ll exeunt like think night must young house call till let being gone away make death bid hence mother par enter dead marry romeo sir hear fair shall back capulet paris scene nurse life die mine father juliet dear wife hast heart time watch friar tis lord too light hand find lady part sweet come ay well up eyes heaven say man take look know stand good face see go old very therefore now stay madam love exit comes tybalt god name true wilt why bed

Though I did clean the text file, I did have some trouble getting it to work nicely. For example, each character’s line in this show is indicated by the first three letters of their name. I addressed this by adding the first three letters of most characters names to the list of stop words. However, I didn’t want to remove “cap” (capulet), as it is already a word spoken in the script, and therefore if I removed it, then I would be altering the data too much. If I could go back and redo the cleaning process, I would find a way to remove the name of whoever speaks the line in every line, so that we wouldn’t have a leftover “cap” right at the top, much bigger than it should be. I also would think about removing the stage directions (exit, exeunt, which are only in the stage directions and not spoken in the play). However, there is a lot of debate about whether or not stage directions should be analyzed alongside the text of a play (I thoroughly believe that it should be), so I’m not sure what I would do! Aside from that, I am very happy with how my word cloud turned out, and I think it has plenty of potential for analysis! For example, the fact that “wife” is one of the more frequent words in this play but “husband” isn’t could be used to analyze how a women’s worth in this play is determined by her marriage, but a man’s worth is determined by much more — prince, sir, and man all show up as frequently used words as well. All in all, I think my word cloud (or more importantly, the program I wrote to create the word clouds) worked really well, and I created an intuitive way to visualize the frequency of words in a text.

Tags: Midterm, Shakespeare

Reflective Blog Post #6

Should humanities students learn to code?

3-D Modeling My Water Bottle

Data Visualization Dos and Don’ts

Using Palladio to Visualize Ads

Reflective Blog Post Week 3

Week 2 Reflective Blog Post

WordPress 101 Reflective Blog Post

FINAL PROJECT PRESENTATION- CULTURAL RELICS FROM CHINA

Spelunking Final Project: Website

Mapping Carleton’s OCS History: Data Viz

Data Visualization: Cave

Data Visualization : Chinese Cultural Relics

How to Create Network Graphs in Python

Using Gephi to Perform Network Analysis

Hacking the Humanities 2022F