What do people mean when they talk about multidimensional vs. multivariate data?

Mi Feng (Mia)
3 min readMay 18, 2021

--

Sometimes we (mostly myself) get confused when hearing people talking about multidimensional and multivariate data, partly due to the fact that these terms are defined and used differently in different areas such as data analysis & visualization, data engineering, statistics, and mathematics. This post aims to describe these terms in a (hopefully) less ambiguous way using an example data structure.

(Please note that the main purpose of writing this blogpost is self-education, and any corrections and suggestions are appreciated.)

Data Analysis & Visualization

The figure below shows a multidimensional and multivariate data structure. The cube has 3 dimensions across its width, height and depth, making it a multidimensional data structure. Each of these dimensions has 3 levels, dividing the cube into 3x3x3 = 27 cells. Each cell can thus be located using a set of keys: (i, j, k), and in this case, the marked cell has a key set of (0, 2, 0). Inside the cell stores a data element containing two value attributes (a, b), making it a multivariate data structure.

In the book Visualization Analysis and Design, Professor Munzner differentiates the two terms as follows, “their multivariate structure depends on the number of value attributes, and their multidimensional structure depends on the number of keys.” [1]

Data Storage

During data processing, we may use more than one method to store the above type of data objects. First, they can be stored in a nested structure, a 3-dimensional array containing 2-tuples.

  1. (Nested) List Structure

The second option is using a flat structure, a table with 5 columns and 27 rows. The first three columns represent the keys, and the last two columns represent the values. Note that in the context of a tabular data structure without any semantic meanings, all of the columns can also be called dimensions.

2. (Flat) Tabular Structure

Statistical & Mathematical Definition

In statistics, the definition of multidimensional data is similar to the tabular structure above, omitting the information of different keys, i.e.,

where n = 27 is the number of samples (ie, rows in the table), and p = 2 is the number of dimensions (ie, the last two columns in the table).

On the other hand, the mathematical definition of such structure is

Here f is a function, l = 3 is the number of dimensions, and m(i) is the number of levels for each dimension.

References

[1] Munzner, Tamara. Visualization analysis and design. CRC press, 2014.

--

--

Mi Feng (Mia)
Mi Feng (Mia)

Written by Mi Feng (Mia)

I build tools to understand human | engineer @adobe | prev engineer @twitter | phd in computer science | mom, wife, daughter www.mifeng.name

No responses yet