I knew that it was important to present data accurately, precisely, and clearly. But I never knew that it had a field of study of its own and that countless critics can be found online pertaining to that. What surprised me upon reading the sources linked on the course website was that some mistakes in data visualization are so common that there are names for those mistakes. For example, the “Lie Factor” is an extremely common one. In fact, it is one mistake that I had not thought strongly about before reading about it. Edward Tufte, Prof. at the Yale University, defines the Lie Factor as the ratio of the size of the effect in the graphic to the size of the effect of the data. We can mathematically define what we mean by size of effect, but I will not go that far. I just want to express my wonderment at the fact that as from now, I will be able to calculate the Lie Factor in my graphic, making sure that it is neither too large nor too small. Another mistake that I had not given much thought is the “compared to what!” mistake. For example, if a graph shows a downward trend in car accidents after enforcement of stricter laws, so what? It is only valid if the graph also shows what happened when this law was not in place. Because only then we can compare.
In regards to my projects, I will try my best to present the graphic data as accurately, precisely, and clearly as possible. For instance, I will definitely try to implement what Tufte calls “visual explanation” which is about the representation of mechanism and motion, of process and dynamics, of causes and effects, of explanation and narrative. Because it is much like an animation or video which stays on our mind longer than a dull static picture. I think this makes more sense to me given that I’m a Physics Major.
Concerning the lecture given by Lin, what was particularly helpful was the distinction she made between exploration and explanation. According to her, exploration is the process of analyzing the data and experiment with the different ways that they could be represented. On the other hand, explanation is when we present the data to an audience, and in that process we have to think about human psychology. She gave us a lot of ideas on how to make data digestible and not misleading. For example:
- When making a scatter plot, the dots must not be too thick as these can create a mess around the line of best fit.
- A line graph makes sense only when we need to see a progression from one event to the next.
- A histogram is drawn when we are analyzing quantitative data, not categorical ones
- It can be misleading to use area for showing the amount of a quantity because our eyes may not perceive a difference in area if that difference is too small.
- Many more