How to Create Network Graphs in Python

A network graph is an interesting and easy way to view connections between characters/objects/things/etc. Network graphs consist of nodes and edges. Edges connect nodes and represent an interaction between those nodes. Nodes are the “things” interacting, think characters in a tv show or countries trading with each other.

Example of a network graph

The graph above doesn’t have any labels, but if we imagine each node (the circles) as people on our social media page and the edges (lines) representing each person following the other we can start to develop a story around this graph.

There are plenty of network graph tools online, but making one in Python offers greater flexibility and avoids any learning curves associated with a 3rd party software. Networks graphs have many uses in DH – everything from visualizing character interactions in X-Men episodes, solving wordle, and finding the shortest distance from one point to another is possible!

In this tutorial, we will create an interactive network graph using data from X-Men issues 150-199. The python program will be able to take user input and generate a graph based on that user input.

Step 0: Set up Python and Download Code + Data

If you would like to follow along with this tutorial you can download the code and dataset here.

Keep in mind that for the readability of this blog I decided to shorten some of my code snippets. This should not take away from the value of the tutorial, but if you are trying to build my exact program I recommend looking at all of the code in the download linked above.

If you already have Python & a code editor installed you can skip the rest of this section!

Before starting this tutorial make sure Python is correctly installed on your device. You can install python here! If you run into any trouble, this tutorial from FreeCodeCamp goes over installing Python, setting up an editor, and covers most of the coding skills you need to complete this tutorial! Let’s move on!

Step 1: Import Packages

The Python community is one of the largest benefits of using the language. There are tons of free libraries available to download that make projects like this possible. Here is a list of 10 of the coolest libraries, but there are many more! We will be using the libraries matplotlib and networkx in this tutorial. Matplotlib is a library that makes graphing in Python incredibly easy! Last but not least, Networkx is the library we will use to manage our network graph.

In order to install matplotlib and networkx we will use pip! In your editor’s terminal type:

pip install matplotlib

and…

pip install networkx

This is what it looks like in my terminal:

Press Enter/Return and the packages should start downloading! If you run into any issues Google is your best bet!

When all your packages are installed it is time to import them into our Python file. If you haven’t already, make a new file called ‘XMenStats.py’ and import our libraries.

# Fun Network tool for Xmen csv file
import networkx as nx
import matplotlib.pyplot as plt

Notice the different wording we used for each import statement. With matplotlib we want the ‘.pyplot’ class within the ‘matplotlib’ library so we specify and then assign that class to the variable ‘plt’. Later on in our code, we can use the class by typing ‘plt.+what we want to do’. We can also access ‘networkx’ as ‘nx’ later on in our code.

Congrats! You have imported all the libraries needed for this project, let’s move on!

Step 2: Make a Simple Network Graph

Making the graph is simple. The library does most of the work for us, we just need to make sure we are giving it good data. Let’s make a simple network graph

# Remember nx is are variable we use to access the networkx library

x = "Cyclops"
y = "Wolverine"

#Create a blank graph page
nxG = nx.Graph()

#Add node
nxG.add_node(x)

#Add edges
nxG.add_edge(x, y)

#Draw the graph
nx.draw(nxG, with_labels=True)

#Show the graph
plt.plot()
Network graph made with the code shown above

Notice how adding an edge automatically added the Y node even though we never explicitly told it to do so. We can use this to our advantage when creating more complicated graphs.

To change the color and size of the nodes we need to input a list into the .draw function.

x = "Cyclops"
y = "Wolverine"

#Create a blank graph page
nxG = nx.Graph()

#Add nodes
nxG.add_node(x)

#Add edges
nxG.add_edge(x, y)

#Lists
nodeSizes = [100, 20]
color_map = ['red', 'blue']

nx.draw(nxG, with_labels=True, font_size=14, node_size=nodeSizes, node_color=color_map)
plt.show()

You can explore all the fun formatting options in the networkx docs, but for now, the code above should help you make simple network graphs!

If you are interested in learning how to make an interactive network graph keep reading, if you came here for a quick and simple tutorial now would be a good time to stop reading!

Step 3: Open CSV

The data we will be using to make our network graph is stored in a .csv file. CSV’s are the file type used in spreadsheets and Python can read them easily.

with open("uncanny-xmen-150-199-characters.csv", "r") as x:

In this line, we are opening the CSV file! Similar to the libraries, we can access this file later on using ‘x’. The “r” tells Python that we only plan on reading the data from the file. You can also write (“w”) which clears the file and then allows you to add your own data, or append (“a”) which adds data to the end of the file without clearing any of its current data. Let’s stick to reading for now.

We can now play with the data in our CSV! Try:

for line in x:
    print(line)

This should print every line in the file!

There is a Python library called CSV which is specifically built for going through .csv files, but I decided not to use it in this project. Don’t worry, it’s still very easy to gather our data from the CSV.

Step 4: Get Data

Now that our file is open we can start collecting data! Each column of our CSV is a different type of data. You can explore the column headers below.

Issue #,Character,Rendered Unconcious,Captured,Declared Dead,Redressed,Depowered,Clothing Torn,Subject to Torture,Quits Team,Surrenders,# of Kills (Humans),# of Kills (Non-humans),Initiates Physical Conflict,Expresses Reluctance to Fight,On a date (with which character),Kiss (with which character),Hand-holding (with which character),Dancing (with which character),Flying with another character,Arm-in-Arm (with which character),Hugging  (with which character),Physical Contact - Other,Carrying (with which character),Shared Bed (with which character),Shared Room (domestically - with which character),"Explicitly States ""I love you"" (to whom)",Shared Undress,Shower (# of panels shower lasts),Bath (# of panels bath lasts),Depicted Eating Food,Visible Tears - # of Panels,Visible Tears - # of Intances,Special Notes,Column,Column2,Column3,Column4,Column5,Column6,Column7,Column8,Column9,Column10,Column11,Column12,Column13,Column14,Column15,Column16

There are some “junk” columns in the .csv, basically everything to the right of “Special Notes” doesn’t have anything below it and we won’t be using it. Explore the columns and find some interesting ones!

Because we know what every column of the CSV represents we can iterate through each row and assign values to variables from what we know the columns are. Before we do this we must split up the line using ‘.split(“,”)’. This splits the line up into a list where every object in the list is whatever is in between the commas in the line. For example, if we had a line like:

cool_line = "bob,gene,judy,mike,marci"
new_list = cool_line.split(",")

And we split the line by commas, we would end up with a list like:

new_list = ["bob", "gene", "judy", "mike", "marci"]

Lists are useful because it is very easy to access values in them. In this case, if we wanted to access “bob” we can use new_list[0]. “gene” would be new_list[1], and so on…

print(new_list[0])
#prints "bob"

This is what it looks like in my code.

    with open("uncanny-xmen-150-199-characters.csv", "r") as x:
        # Move through each line
        for line in x:
            #Split up the data
            newline = line.split(",")
            #Assign values
            Issue = newline[0]

In this code snippet, we only assign Issue to newline[0], but we could easily assign more if we wanted!

Step 5: Clean CSV

If you explore the CSV file you will quickly see how inconsistent the data is. Take line 7 of the CSV for example:

150,Colossus = Peter (Piotr) Rasputin,,,,,1,,,,,,,,,,,,,,,,"Nightcrawler, x2 Wolverine x4, Kitty Pryde, Cyclops",Wolverine,,,,,,,,,,,,,,,,,,,,,,,,,,

The character field has the character’s name and the actor’s name, something that the author definitely should’ve split up into separate columns. Or, take the column that has Wolverine in it. What do x2 and x4 mean? Why are they on either side of the character? Why are they included at all? We have no way of knowing what the author meant when putting these in the CSV so we are going to do our best to ignore them. The same logic goes for the random quotation mark after Cyclops. We will use some fun Python tricks to ignore these random inefficiencies in our graph.

The function below is the cleaner function I used to “clean” the CSV. Simply put, the function takes in a messy value, like “x2 Cyclops x4”, and returns “Cyclops”, but, I bet the function doesn’t work like you think it would. Without doing this, our code would interpret “x2 Cyclops x4” and “Cyclops” as different characters when in reality they are the same and should be placed on the same node.

#Cleans character name
def cleaner(person):
    CharList = [*List of all characters formatted nicely*]
    for char in CharList:
        if char in person:
            return char
    return person

Instead of cleaning the entire CSV file, I decided to clean the values that come out of the CSV file. To do this I got a list of every unique character and manually formatted it to remove all of the random markings. The list of those characters is “CharList” in the function. When the function receives a person it checks to see if any of the nicely formatted character names are in the person string. Because “Cyclops” is technically a part of “x2 Cyclops x4” we know that these are the same character even though their strings are different. The function returns the clean version of the character name and our dirty CSV problems are solved!

It may be worth noting that this function doesn’t change anything in the CSV file or change the strings that get inputed into the function. It makes a “guess” at what the inputed string really means and returns that.

I decided to “clean” the CSV file this way because I wanted to only use Python for this project. It should be easy to use OpenRefine or another CSV cleaning software to more thoroughly clean the data.

Step 6: Get User Input

I wanted to make the graph interactive, so there needs to be some way for the user to interact with the data. I decided to have the user select a character, select an action, and then show them the network graph for that combination. The characters will be a list of all the characters in the CSV and the actions will be most of the column titles in the CSV. Again, using lists allows us to easily access the data inside of them. We can ask the user for a number which correlates to a specific character or action and easily use that number to get the value from each list. This is what I’m envisioning:

Screen Capture of terminal

In the example above, the user wants to see all the characters Cyclops carried, so we would need to go through all of the rows in the CSV whose character is Cyclops. Then, we would need to look in the “Carrying” column and see if there is a character in it. If there is we add an edge between Cyclops and the character we found, if there is not a character in that row we need to move to the next row. Let’s see what that would look like in code:

# This is example code and not meant to be run

character = "Cyclops"

for line in CSV_FILE:
    if line[1] = character:
       # If value is not blank
       # line[21] is the "Carrying (with which character)" column in the data
       if line[21] != "":
          # Don't forget cleaner function!
          nx.add_edge(character, cleaner(line[25]))

That’s exactly what we wanted! Cyclops carries five people throughout our dataset! This isn’t the exact code I used in my project, but it should give you an idea of what you need to do to create something similar. Go try it for yourself!

Special Notes & Final Thoughts

The code I based this tutorial off of is more advanced than what I covered in this article, but this article should give you a headstart when trying to read my code and should have taught u how to make your own simple network graphs.

Two resources that helped me learn how to do this are:

This video about networkx

The networkx docs – specifically the part about how to color your nodes

I think that network graphs can be very useful when used in moderation. Making simple network graphs in python is simple, quick, and is a valuable tool to have in your toolbox.

3 thoughts on “How to Create Network Graphs in Python

  1. I am glad you posted a tutorial on this. I tried to use this dataset for my midterm but ran into too many formatting issues with the names of characters in the dataset. I had been trying to use OpenRefine to clean the data and it was miserable. Definetely using a simple “if in” logic statement is what I needed. It is really cool to follow along with this and see a totally different approach than I was trying.

  2. Your tutorial is really thorough and easy to understand. I’ve always found using other libraries in programming to be a bit daunting, especially ones related to graphics, so I like that you highlighted how libraries tend to do most of the difficult work and it’s mainly up to the user to manipulate the data and content. It’s also great that you made it interactive and limited the network since in the past I’ve found that network graphs can be a bit overwhelming with how much information they can present.

  3. This is a really cool and intuitive tutorial, especially since I did something very similar for my own. The only difference between network visualization in Python and Flourish, the application that I use to create network graphs, is that Flourish does all the back-end work for the user. There is also so much more customization that is possible with python as one would have to assign a node to multiple groups (action equivalents) in Flourish to see the colors change, and even then one cannot select a single group to see at a time. This goes to show that there are trade offs between accessibility/skill required, functionality, and price.

Comments are closed.

css.php