Introduction

Over the pandemic winter holidays, I took advantage of being stuck at home to create a visualization for the data viz challenge held by the Data Visualization Society. Every year, they invite members to visualize the data from their annual survey. While bragging rights are the only prize, I thought it would be a good opportunity to push myself to practice my data visualization and web visualization skills. I found out about it a little on the late side, giving me just 7 days to come up with something.

Here’s a sneak peek of the end result, and you can click here to try out it out for yourself!

Exploratory data analysis

My first and biggest mistake was leaning too heavily on my data science background. I spent a majority of that week cleaning the data and searching for the most interesting associations. It was a mistake, because my goal wasn’t to do more of what I already knew how to do. Plus, it was difficult to find meaningful associations with so many missing values, unspecified “other”, and free text fields to parse. I did massive amount of pattern matching, then threw all sorts of interesting questions at the data, like

I found myself with just over 24 hours until the deadline, and absolutely no idea what I wanted to submit visualization-wise.

Planning the viz

Out of a bit of desperation, I ended up very simple data about what country people are from. I combined this with country population estimates from the United Nations Population Division (as arranged by Worldometers.info), to compare the number of observed vs expected responses per country. After struggling with the visual impact, I realized that providing the right perspective would have to entail two different ratio axes.

If observed counts were always in the denominator, the scale would range from 0.015 to 43.6, with 1.00 signifying observed=expected. Plotted numerically, this would obfuscate the fact that 0.015 actually represents a greater magnitude of under-representation (1/0.015 ≈ 66.7) than 43.6 represents over-representation, because 0.015 is only 0.985 units away from 1.00, while 43.6 is a whopping 42.6 units away. A similar issue occurs when expected counts is used as the denominator. The solution was to use expected as the denominator for over-represented countries and observed as the denominator for under-represented countries, making sure to subtract 1 from each so that the axis could be centered at zero.

Making the viz

The actual D3 code involved a ton of googling, copying and pasting, and debugging to get things to work as I wanted them to (or work at all). As someone completely new to any kind of web development, I found the whole DOM thing to be super finicky and arbitrary-feeling, but it was incredibly satisfying when I started to get a feel for it and wrote some code that magically worked right away.

By the competition deadline, I had only managed the simple bar chart with no animations other than a tooltip, sorted by continent. Not what I hoped to achieve, but at least it was fully functional. Then they extended the deadline by a week, so I started thinking of what bells and whistles I could add. My first idea was to have more than one view of the data, which involved calculating a couple additional orderings of the data externally in R, then modifying my code to redo the barchart whenever a new option is selected from a menu. It worked, but looked less-than-polished since it wasn’t so much re-sorting bars as much as it was redrawing the whole thing. I realized I had to re-calculate element positions and use transitions to elegantly move existing bars/text around. To make it even fancier, I distributed transitions over a period of time using delays. Then, by accident or luck, I realized that all of these great transitions broke if I tried to scroll or click anywhere as they were playing. After trying hacky things that should probably not be tried, I stumbled across the concept of naming transitions, which somehow did the trick. I’m pretty happy with the end product, and feel like I learned a lot.

Time spent/tools:

Approximate time breakdown:

Languages/tools used: R, RStudio, D3, HTML/CSS, Visual Studio Code