Week 9: Tutorial

This tutorial will walk you through using Recogito to easily identify place, people and events in large text data set. Recogito is an open source software that in 2018 was voted the Best DH Tool as part of the DH awards. The platform allows the user to upload a large set of text and easily identify specific words in a designated category. I used this tool in my midterm to select all of the locations mentioned in a text to create a map. Recogito is a helpful first step in textual analysis that could lead to a large number of other possible uses in DH work. For instance it could be used to build a network of characters or people in a text, identify events for analysis or help create a timeline.

Recogito can be used in a variety of different ways to annotate and analyze texts. The following tutorial walks you through the steps to annotate people, place and event and download a CSV file with this information.

  1. Recogito requires you to register for an account so either log in with the button on the top right or create an account on the main web page (seen below).
Recogito home page with account registration

2. The next step is to upload your text file. After logging in the page will appear like below and you need to click the large blue “+ New” button on the left bar. Click the “File upload” and select the desired text file on your computer to upload. The uploaded file will appear in the “My Documents”.

Upload files
Select file upload option

3. Select your file and click the “Options” drop down menu on the top right. Select “Named Entity Recognition” (NER) at the bottom of the menu. NER identifies places and people in the text automatically.

Select “Named Entity Recognition” from the option drop down.

4. Recogito has several different Recognition engines that allows you to analyze texts in languages other than conventional modern English. Select the engine that is most relevant to your text.

The recognition engines for use with the NER software

5. You can also select which database you want the NER to use to identify places. Deselect the “identify entities against all available authority files” check box and select which authority files are most relevant to your file. This feature allows you to analyze historical texts that use place names not necessarily in use today. GeoNames is best used for most relatively modern texts.

All the available authority files possible to select to identity entities.

6. Open the file once the NER has completed parsing the file. The program will have automatically gone through the text and annotated words it thinks are names of people and places.

A completed parsed file

7. The next step is to manually confirm the annotations the program has identified. If you make change to the annotation, it will ask if you want to merge them with the other instances it has identified in the text. Click Yes to merge the annotations.

8. Now you have an annotated document! There are a variety of ways to download this information. We will select the CSV format. This gives us a file with each individual entity identified by the program as a row with the corresponding word type along with other identifying data for each annotation. With this data you can then go on to further analyze and clean the data to create a visualization. This is a great tool to automate data analysis of text.

Download options

Further Resources:

10 minute tutorial by Recogito

A comprehensive guide to the program

1 thought on “Week 9: Tutorial

  1. This is one of the most interesting tools I have learned about so far! Are the databases limited to the ones that you showed and real life entities or could somebody upload a fictional work? It’s crazy to me that now a computer can do all the reading and compile easy-to-comprehend entity information. The amount of time that could be saved for historians and anthropologists is amazing.

Comments are closed.

css.php