1/11/2016

Heatmap

We launched Cirrus.js a few weeks ago to share some technologies we are developing at Planet OS (now Intertrust). We recently had the need to visualize some datasets on a heatmap, so we added one to Cirrus.js. What is a heatmap? What kind of data can it visualize? Read on.

One of the challenge of building a tool to index all environmental sensor data is that we have to work with a huge variety of data formats and structures. One of these structure is two dimensional data (2D data). We tend to understand 2D data as any data that can fit on a table, where each cell, being at the intersection of a row and a column, contains a value. But 2D data can mean multiple things, and only some 2D datasets are suitable for a heatmap. Let's give some examples from the environmental sensor data we work with on a daily basis at Planet OS.

Let's say we have a table showing a temperature value for an array of sensors for each day of the week. To better visualize this grid of data, we can color each cell according to its value. This can be called a "color matrix" and is sometimes called a heatmap.


Others will think of a heatmap has data visualized as gradients on a map, for example, surface temperature data for each latitude/longitude. In this case, since the dimensions are spatial, it makes sense to show them on a map. My friend and colleague Ilya made me realize that if you show this data on a color grid, it will look the same as showing it on a map with a mercator projection.


And now for the definition. What I call a "cartesian heatmap" is a color matrix showing a grid of discrete values from a continuous 2D space. What does that mean?

One way to understand what data is suitable for any chart type is to describe the dimensions and measures as continuous or discrete. To learn more about this conceptual framework, maybe this paper can help. Let's take the example dataset used for the first figure: temperature of 7 sensors over 7 days. Temperature is continuous, sensor ID and day are both discrete (categorical). This grid data example could fit on a table or on a color matrix.

Let's take another example: water temperature for 7 sensors at 7 depth steps. Here temperature is still continuous, sensor ID is still discrete (categorical) but depth is also discrete, which is confusing since depth is a continuum, but the depth data we have is a series of temperature readings taken at each depth step. We treat the temperature dimension as continuous, because it can take any arbitrary value over a continuum, but the depth dimension as discrete, as it's a series of discrete depth levels. This data can be visualized on a color matrix, each cell having a temperature value encoded as color, at the intersection of a sensor ID on the X axis and of a depth step on the Y axis. But it could also be visualized as a line chart, each line representing one sensor, the X axis would be depth and the Y axis would be temperature.


We tend to see the line chart as multiple lines each showing a measure varying over a dimension. Comparing values on one axis is different than comparing on the other. For example, we can see the variation of temperature for each single sensor over time, but to compare temperature values at the same time slice value across multiple sensors is not the same perceptual task.

A heatmap is a bit closer to a "permutation matrix" or to a "stacked sparklines".



We know how to build a line chart (continuous measure on Y axis, discrete ordinal on X axis) and a color matrix (continuous measure as color, discrete categorical/ordinal on both X and Y axes). But how to build a heatmap? Continuous measure as color, discrete dimensions on both X and Y axes, but preferably discrete steps from a continuous dimension. One example is temperature for each latitude/longitude. Another would be temperature for each depth step for each day of the month. Unlike a line chart, a heatmap uses the same visual encoding for each axes, so comparing dimensions uses the same perceptual task: comparing colors. The visual metaphor is closer to the idea of an homogenous 2D grid of values.



Another visual metaphor that comes for free with the name "heatmap" is the idea of heat. User often have in mind these rainbow colored thermal imagery when they look at a heatmap. They will expect to see colored zones with smooth transitions between them, which is not alway the case in grid data. Also, we have to pick the right color scheme that will not confuse the user into thinking, for example, that red means very hot and blue very cold. Choosing the right color scale for a heatmap is more difficult than it looks. Maybe it could be the subject of a next blog post?

In the meantime, please enjoy the new heatmap we are sharing today as a new chart type in Cirrus.js. In conclusion, a heatmap is perfect for visualizing a 2D grid of data representing discrete values from a continuous 2D space. It was simple enough to expand the library to add a grid component with the current datavis pipeline architecture, just adding a color scale and some minor tweaks like single tooltip. Cirrus.js is under heavy development, but we hope sharing insights about how we build it gives you an idea of how we are solving visualization problems at Planet OS.