Data reduction (a.k.a. dimensionality reduction) is the task of consolidating a large set of metrics into a smaller set. It is a wrangling and analytical operation that is useful in at least two situations. First, reduction can simplify the information being conveyed in our data to make it easier to comprehend and discuss it. Second, it is a possible method for dealing with collinearity problems in regression analysis.
Analysts often reduce data in their analysis. Sometimes, the reduction is performed thoughtlessly, without a deep contemplation of their reduction choices and their implications on an analysis’ rigor and meaning.
The methods described in this module focus on supervised reduction, which occurs when the analyst actively and thoughtfully engages in choices about how to reduce data. There are also passive, algorithmically-implemented methods (often associated with the field of data mining) that do not demand much analyst attention. Before delving into these automated methods, it is probably useful to understand what the algorithms are doing.
Simulated Psychological Data. A generated data set that scores 400 respondents over 15 traits: happiness, optimism, sociability, anxiety, anger, jealousy, resentment, fear, boredom, tiredness, annoyance, irritability, hopefulness, friendliness, and ambition. These traits are all rated on a zero (inapplicable) to ten (fully applicable). Data can be downloaded here.