Joseph Nathan Cohen

Associate Professor of Sociology, Queens College in the City University of New York

6530 Kissena Boulevard, Queens, New York, 11367 

Data Reduction


Data reduction (a.k.a. dimensionality reduction) is the task of consolidating a large set of metrics into a smaller set.  It is a wrangling and analytical operation that is useful in at least two situations.  First, reduction can simplify the information being conveyed in our data to make it easier to comprehend and discuss it.  Second, it is a possible method for dealing with collinearity problems in regression analysis.

Analysts often reduce data in their analysis.  Sometimes, the reduction is performed thoughtlessly, without a deep contemplation of their reduction choices and their implications on an analysis’ rigor and meaning.

The methods described in this module focus on supervised reduction, which occurs when the analyst actively and thoughtfully engages in choices about how to reduce data.  There are also passive, algorithmically-implemented methods (often associated with the field of data mining) that do not demand much analyst attention.  Before delving into these automated methods, it is probably useful to understand what the algorithms are doing.


An Introduction to Data Reduction

An introduction to the task of data reduction

Simple Indexes

The most common method. It has uses and limits.

Cronbach's Alpha

A simple test to test the plausability that a variable set measures the same underlying construct

Exploratory Factor Analysis

A method to find relationships in a variable set

Data Sets Used in these Modules

OECD Better Life.  The OECD Better Life Index measures 41 countries’ overall quality of life using 24 different variablesThe data used in these exercises can be downloaded here

Simulated Psychological Data.  A generated data set that scores 400 respondents over 15 traits: happiness, optimism, sociability, anxiety, anger, jealousy, resentment, fear, boredom, tiredness, annoyance, irritability, hopefulness, friendliness, and ambition. These traits are all rated on a zero (inapplicable) to ten (fully applicable). Data can be downloaded here.

Study Materials

Related Posts