Net Worth by Age Group

Wealth varies by age. What counts as "rich" versus "poor" in one's 30s versus their 60s?

In my last project analyzing household wealth using the Survey of Consumer Finances (Cohen 2017), I found that cross-sectional regression analyses predict age to be the strongest predictor of wealth. The relationship makes sense. People save money over time and are more likely to inherit familial property later in life.

So what are the typical differences between in household finance across age groups? What counts as “rich”, “middling”, or “poor” for a twenty-something versus a fifty-something? And what do these differences tell us about household finances? In this analysis, I examine differences in net worth across age groups in the 2022 data. It looks at the size of wealth differences across age groups and the incidence of low net worth over the life course.

Data and Methods

I use data from the Survey of Consumer Finances, a triennial survey of several thousand U.S. households (with a high-wealth oversample) (Federal Reserve 2023). In this analysis, we will examine the distribution. Readers can see the details of my acquisition and wrangling of this data in the folds of this document or by consulting its associated R Markdown file. I discuss the data and its analysis at greater length in this post.

# Generic session set up

# First, start your session with an empty memory
rm(list=ls())
gc()

# Set your working directory
directory <- "D:/Dropbox/Research/Household Finance/Net Worth - Age"
setwd(directory)

# Load the libraries used in this session
library(httr)
library(jsonlite)
library(survey)
library(mitools)
library(haven)
library(scales)
library(ggplot2)
library(kableExtra)
library(knitr)
library(tidyverse)

# Set the random seed
set.seed(123)

# Turn off scientific notation
options(scipen=999)

# PART ONE: DOWNLOAD DATA
# Download Main 2022 Files
response_0 <- GET("https://www.federalreserve.gov/econres/files/scf2022s.zip", write_disk("scf2022s.zip", overwrite = TRUE))
response_1 <- GET("https://www.federalreserve.gov/econres/files/scfp2022s.zip", write_disk("scfp2022s.zip", overwrite = TRUE))
response_2 <- GET("https://www.federalreserve.gov/econres/files/scf2022rw1s.zip", write_disk("scf2022rw.zip", overwrite = TRUE))

rm(list=ls(pattern = "response")) # Clean up objects

# PART TWO: UNZIP DATA
unzip("scf2022s.zip")
unzip("scfp2022s.zip")
unzip("scf2022rw.zip")

file.remove("scf2022s.zip")  # Erase the zip files because I don't need them and they take up space.
file.remove("scfp2022s.zip")
file.remove("scf2022rw.zip")

# PART THREE:  CONVERT DATA TO R FORMAT
# The data are distributed in Stata format.  Below, I import data from its Stata format.
scf2022 <- read_dta("p22i6.dta")
scf2022s <- read_dta("rscfp2022.dta")
scf2022rw <- read_dta("p22_rw1.dta")

# PART FOUR: DATA QUALITY CHECK
# Data quality check: Ensure all three files have corresponding rows
stopifnot( nrow( scf2022 ) == nrow( scf2022rw ) * 5 )  # One RW score per household
stopifnot( nrow( scf2022 ) == nrow( scf2022s ) )

#Confirm only the primary economic unit and the five implicate identifiers overlap:
stopifnot( all( sort( intersect( names( scf2022 ) , names( scf2022s ) ) ) == c( 'y1' , 'yy1' ) ) )
stopifnot( all( sort( intersect( names( scf2022 ) , names( scf2022rw ) ) ) == c( 'y1' , 'yy1' ) ) )
stopifnot( all( sort( intersect( names( scf2022s ) , names( scf2022rw ) ) ) == c( 'y1' , 'yy1' ) ) )

# Convert column names to lower case in all sets, per Damico script
names(scf2022) <- tolower(names(scf2022))
names(scf2022rw) <- tolower(names(scf2022rw))
names(scf2022s) <- tolower(names(scf2022s))

# Per Damico script
# Remove implicate identifier from RW table, and add column of fives for weighting
scf2022rw[, 'y1'] <- NULL
scf2022[,'five'] <- 5

save(scf2022, scf2022s, scf2022rw, file = "SCF 2022 Raw Data Tables.RData") # Save the data

# PART FIVE: MERGE MAIN AND SUMMARY DATA
# Merge Summary and Raw Data Tables by 'y1'
scf2022 <- merge(scf2022, scf2022s, by = "y1", sort = T)

# PART SIX: RECAST DATA AS FIVE SEPARATE IMPLICATES
# Splitting data set into five separate sets of individual implicates
scf_1 <- subset(scf2022, as.numeric(substr(scf2022$y1, nchar(scf2022$y1), nchar(scf2022$y1))) == 1)
scf_2 <- subset(scf2022, as.numeric(substr(scf2022$y1, nchar(scf2022$y1), nchar(scf2022$y1))) == 2)
scf_3 <- subset(scf2022, as.numeric(substr(scf2022$y1, nchar(scf2022$y1), nchar(scf2022$y1))) == 3)
scf_4 <- subset(scf2022, as.numeric(substr(scf2022$y1, nchar(scf2022$y1), nchar(scf2022$y1))) == 4)
scf_5 <- subset(scf2022, as.numeric(substr(scf2022$y1, nchar(scf2022$y1), nchar(scf2022$y1))) == 5)

# Clean Up Subject Identifier in Individual Implicates
for (i in 1:5){
  temp <- get(paste0("scf_", i))
  temp$yy1 <- temp$yy1.x
  temp$yy1.x <- NULL
  temp$yy1.y <- NULL
  assign(paste0("scf_", i), temp)
}

# Compile Individual Implicates to a List
scf_data_list <- list(scf_1, scf_2, scf_3, scf_4, scf_5)

# Removing unnecessary objects to save memory and space in Environment window
rm(scf_1, scf_2, scf_3, scf_4, scf_5)
rm(i, temp)
gc()

# PART SIX: CLEANING DATA
# Replace missing replicate weights with zeros to prevent downstream bugs
scf2022rw[ is.na( scf2022rw ) ] <- 0

# Rescale weights, per documentation
scf2022rw[ , paste0( 'wgt' , 1:999 ) ] <-
    scf2022rw[ , paste0( 'wt1b' , 1:999 ) ] * scf2022rw[ , paste0( 'mm' , 1:999 ) ]

# Using Damico's strategy of storing as a data table with y1 and wgts*
scf2022rw <- scf2022rw[ , c( 'yy1' , paste0( 'wgt' , 1:999 ) ) ]

# Check if yy1 values match across the datasets and RW table
all(scf_data_list[[1]]$yy1 == scf2022rw$yy1)
all(scf_data_list[[2]]$yy1 == scf2022rw$yy1)
all(scf_data_list[[3]]$yy1 == scf2022rw$yy1)
all(scf_data_list[[4]]$yy1 == scf2022rw$yy1)
all(scf_data_list[[5]]$yy1 == scf2022rw$yy1)

# Survey Design Object following Damico
scf_design <-
  svrepdesign(
    weights = ~wgt,
    repweights = scf2022rw[ , -1 ] ,
        data = imputationList( scf_data_list ) ,
        scale = 1 ,
        rscales = rep( 1 / 998 , 999 ) ,
        mse = FALSE ,
        type = "other" ,
        combined.weights = TRUE
  )

scf_design_single <-
  svrepdesign(
    weights = ~wgt,
    repweights = scf2022rw[ , -1 ] ,
        data = scf_data_list[[1]] ,
        scale = 1 ,
        rscales = rep( 1 / 998 , 999 ) ,
        mse = FALSE ,
        type = "other" ,
        combined.weights = TRUE
  )
 
# Damico's function to combine implicates to give summary estimate from each of the five sets.
scf_MIcombine <-
    function (results, variances, call = sys.call(), df.complete = Inf, ...) {
        m <- length(results)
        oldcall <- attr(results, "call")
        if (missing(variances)) {
            variances <- suppressWarnings(lapply(results, vcov))
            results <- lapply(results, coef)
        }
        vbar <- variances[[1]]
        cbar <- results[[1]]
        for (i in 2:m) {
            cbar <- cbar + results[[i]]
            # MODIFICATION:
            # vbar <- vbar + variances[[i]]
        }
        cbar <- cbar/m
        # MODIFICATION:
        # vbar <- vbar/m
        evar <- var(do.call("rbind", results))
        r <- (1 + 1/m) * evar/vbar
        df <- (m - 1) * (1 + 1/r)^2
        if (is.matrix(df)) df <- diag(df)
        if (is.finite(df.complete)) {
            dfobs <- ((df.complete + 1)/(df.complete + 3)) * df.complete *
            vbar/(vbar + evar)
            if (is.matrix(dfobs)) dfobs <- diag(dfobs)
            df <- 1/(1/dfobs + 1/df)
        }
        if (is.matrix(r)) r <- diag(r)
        rval <- list(coefficients = cbar, variance = vbar + evar *
        (m + 1)/m, call = c(oldcall, call), nimp = m, df = df,
        missinfo = (r + 2/(df + 3))/(r + 1))
        class(rval) <- "MIresult"
        rval
    }
 
save(scf_design, scf_design_single, scf_data_list, scf_MIcombine, file = "SCF22.RData")

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

# Generic session set up

# First, start your session with an empty memory

rm(list=ls())

gc()

# Set your working directory

directory <- "D:/Dropbox/Research/Household Finance/Net Worth - Age"

setwd(directory)

# Load the libraries used in this session

library(httr)

library(jsonlite)

library(survey)

library(mitools)

library(haven)

library(scales)

library(ggplot2)

library(kableExtra)

library(knitr)

library(tidyverse)

# Set the random seed

set.seed(123)

# Turn off scientific notation

options(scipen=999)

# PART ONE: DOWNLOAD DATA

# Download Main 2022 Files

response_0 <- GET("https://www.federalreserve.gov/econres/files/scf2022s.zip", write_disk("scf2022s.zip", overwrite = TRUE))

response_1 <- GET("https://www.federalreserve.gov/econres/files/scfp2022s.zip", write_disk("scfp2022s.zip", overwrite = TRUE))

response_2 <- GET("https://www.federalreserve.gov/econres/files/scf2022rw1s.zip", write_disk("scf2022rw.zip", overwrite = TRUE))

rm(list=ls(pattern = "response")) # Clean up objects

# PART TWO: UNZIP DATA

unzip("scf2022s.zip")

unzip("scfp2022s.zip")

unzip("scf2022rw.zip")

file.remove("scf2022s.zip") # Erase the zip files because I don't need them and they take up space.

file.remove("scfp2022s.zip")

file.remove("scf2022rw.zip")

# PART THREE: CONVERT DATA TO R FORMAT

# The data are distributed in Stata format. Below, I import data from its Stata format.

scf2022 <- read_dta("p22i6.dta")

scf2022s <- read_dta("rscfp2022.dta")

scf2022rw <- read_dta("p22_rw1.dta")

# PART FOUR: DATA QUALITY CHECK

# Data quality check: Ensure all three files have corresponding rows

stopifnot( nrow( scf2022 ) == nrow( scf2022rw ) * 5 ) # One RW score per household

stopifnot( nrow( scf2022 ) == nrow( scf2022s ) )

#Confirm only the primary economic unit and the five implicate identifiers overlap:

stopifnot( all( sort( intersect( names( scf2022 ) , names( scf2022s ) ) ) == c( 'y1' , 'yy1' ) ) )

stopifnot( all( sort( intersect( names( scf2022 ) , names( scf2022rw ) ) ) == c( 'y1' , 'yy1' ) ) )

stopifnot( all( sort( intersect( names( scf2022s ) , names( scf2022rw ) ) ) == c( 'y1' , 'yy1' ) ) )

# Convert column names to lower case in all sets, per Damico script

names(scf2022) <- tolower(names(scf2022))

names(scf2022rw) <- tolower(names(scf2022rw))

names(scf2022s) <- tolower(names(scf2022s))

# Per Damico script

# Remove implicate identifier from RW table, and add column of fives for weighting

scf2022rw[, 'y1'] <- NULL

scf2022[,'five'] <- 5

save(scf2022, scf2022s, scf2022rw, file = "SCF 2022 Raw Data Tables.RData") # Save the data

# PART FIVE: MERGE MAIN AND SUMMARY DATA

# Merge Summary and Raw Data Tables by 'y1'

scf2022 <- merge(scf2022, scf2022s, by = "y1", sort = T)

# PART SIX: RECAST DATA AS FIVE SEPARATE IMPLICATES

# Splitting data set into five separate sets of individual implicates

scf_1 <- subset(scf2022, as.numeric(substr(scf2022$y1, nchar(scf2022$y1), nchar(scf2022$y1))) == 1)

scf_2 <- subset(scf2022, as.numeric(substr(scf2022$y1, nchar(scf2022$y1), nchar(scf2022$y1))) == 2)

scf_3 <- subset(scf2022, as.numeric(substr(scf2022$y1, nchar(scf2022$y1), nchar(scf2022$y1))) == 3)

scf_4 <- subset(scf2022, as.numeric(substr(scf2022$y1, nchar(scf2022$y1), nchar(scf2022$y1))) == 4)

scf_5 <- subset(scf2022, as.numeric(substr(scf2022$y1, nchar(scf2022$y1), nchar(scf2022$y1))) == 5)

# Clean Up Subject Identifier in Individual Implicates

for (i in 1:5){

temp <- get(paste0("scf_", i))

temp$yy1 <- temp$yy1.x

temp$yy1.x <- NULL

temp$yy1.y <- NULL

assign(paste0("scf_", i), temp)

}

# Compile Individual Implicates to a List

scf_data_list <- list(scf_1, scf_2, scf_3, scf_4, scf_5)

# Removing unnecessary objects to save memory and space in Environment window

rm(scf_1, scf_2, scf_3, scf_4, scf_5)

rm(i, temp)

gc()

# PART SIX: CLEANING DATA

# Replace missing replicate weights with zeros to prevent downstream bugs

scf2022rw[ is.na( scf2022rw ) ] <- 0

# Rescale weights, per documentation

scf2022rw[ , paste0( 'wgt' , 1:999 ) ] <-

scf2022rw[ , paste0( 'wt1b' , 1:999 ) ] * scf2022rw[ , paste0( 'mm' , 1:999 ) ]

# Using Damico's strategy of storing as a data table with y1 and wgts*

scf2022rw <- scf2022rw[ , c( 'yy1' , paste0( 'wgt' , 1:999 ) ) ]

# Check if yy1 values match across the datasets and RW table

all(scf_data_list[[1]]$yy1 == scf2022rw$yy1)

all(scf_data_list[[2]]$yy1 == scf2022rw$yy1)

all(scf_data_list[[3]]$yy1 == scf2022rw$yy1)

all(scf_data_list[[4]]$yy1 == scf2022rw$yy1)

all(scf_data_list[[5]]$yy1 == scf2022rw$yy1)

# Survey Design Object following Damico

scf_design <-

svrepdesign(

weights = ~wgt,

repweights = scf2022rw[ , -1 ] ,

data = imputationList( scf_data_list ) ,

scale = 1 ,

rscales = rep( 1 / 998 , 999 ) ,

mse = FALSE ,

type = "other" ,

combined.weights = TRUE

)

scf_design_single <-

svrepdesign(

weights = ~wgt,

repweights = scf2022rw[ , -1 ] ,

data = scf_data_list[[1]] ,

scale = 1 ,

rscales = rep( 1 / 998 , 999 ) ,

mse = FALSE ,

type = "other" ,

combined.weights = TRUE

)

# Damico's function to combine implicates to give summary estimate from each of the five sets.

scf_MIcombine <-

function (results, variances, call = sys.call(), df.complete = Inf, ...) {

m <- length(results)

oldcall <- attr(results, "call")

if (missing(variances)) {

variances <- suppressWarnings(lapply(results, vcov))

results <- lapply(results, coef)

}

vbar <- variances[[1]]

cbar <- results[[1]]

for (i in 2:m) {

cbar <- cbar + results[[i]]

# MODIFICATION:

# vbar <- vbar + variances[[i]]

}

cbar <- cbar/m

# MODIFICATION:

# vbar <- vbar/m

evar <- var(do.call("rbind", results))

r <- (1 + 1/m) * evar/vbar

df <- (m - 1) * (1 + 1/r)^2

if (is.matrix(df)) df <- diag(df)

if (is.finite(df.complete)) {

dfobs <- ((df.complete + 1)/(df.complete + 3)) * df.complete *

vbar/(vbar + evar)

if (is.matrix(dfobs)) dfobs <- diag(dfobs)

df <- 1/(1/dfobs + 1/df)

}

if (is.matrix(r)) r <- diag(r)

rval <- list(coefficients = cbar, variance = vbar + evar *

(m + 1)/m, call = c(oldcall, call), nimp = m, df = df,

missinfo = (r + 2/(df + 3))/(r + 1))

class(rval) <- "MIresult"

rval

}

save(scf_design, scf_design_single, scf_data_list, scf_MIcombine, file = "SCF22.RData")

Here, the focus is on the household net worth, which is the market value of a household’s assets, less the costs of settling their outstanding debts. This can be treated as a metric that captures a family’s ability to come up with money. The metric is less useful for making fine distinctions between households with similar wealth levels and better for assessing differences on the order of hundreds of thousands or millions of dollars.

We are examining how the distribution of net worth varies according to the age of the household head. In the SCF, the household head is the individual or partners who are described as “economically dominant” in a primary economic unit. It includes the person(s) who cover the household’s basic living costs for themselves and all economic dependents who live within that household. We group households according to the age of the SCF’s “reference person”, whom the Survey considers to be the male in a mixed-sex pair and the older person in a single-sex couple. This convention is likely maintained to create a constancy over time in the measurement of the SCF concept of “reference person”. My own expectation is that the choice of reference person does not strongly affect the wealth accumulation differentials that we observe, but I would test different ways of counting the reference person (i.e., the oldest in any relationship, the mean age when people are paired) if there were interest. I chose to group them by decades, mostly so that the research findings conform to commonplace terms used to discuss personal finance and age.

load("SCF22.RData")<br><br>scf_design <- update(scf_design,<br>                     age_group = cut(age, breaks = c(0, 30, 40, 50, 60, 70, 999),<br>                                     labels = c("<30", "30s", "40s", "50s", "60s", "70s+")))

1	load("SCF22.RData")<br><br>scf_design <- update(scf_design,<br> age_group = cut(age, breaks = c(0, 30, 40, 50, 60, 70, 999),<br> labels = c("<30", "30s", "40s", "50s", "60s", "70s+")))

Analysis

The table below describes the distribution of net worth across our age groups. The columns represent age groups, and the rows represent different percentile net worth values for each age group.

Percentile Net Worth Values by Age Group

p10 <- scf_MIcombine( with( scf_design, svyby(~networth, ~age_group, svyquantile,  
                                              0.1 , se = TRUE , interval.type = 'quantile')))
p25 <- scf_MIcombine( with( scf_design, svyby(~networth, ~age_group, svyquantile,  
                                              0.25 , se = TRUE , interval.type = 'quantile')))
p50 <- scf_MIcombine( with( scf_design, svyby(~networth, ~age_group, svyquantile,  
                                              0.5 , se = TRUE , interval.type = 'quantile')))
p75 <- scf_MIcombine( with( scf_design, svyby(~networth, ~age_group, svyquantile,  
                                              0.75 , se = TRUE , interval.type = 'quantile')))
p90 <- scf_MIcombine( with( scf_design, svyby(~networth, ~age_group, svyquantile,  
                                              0.9 , se = TRUE , interval.type = 'quantile')))
p95 <- scf_MIcombine( with( scf_design, svyby(~networth, ~age_group, svyquantile,  
                                              0.95 , se = TRUE , interval.type = 'quantile')))
p99 <- scf_MIcombine( with( scf_design, svyby(~networth, ~age_group, svyquantile,  
                                              0.99 , se = TRUE , interval.type = 'quantile')))
pcttab_10  <- p10$coefficients
pcttab_25  <- p25$coefficients
pcttab_50  <- p50$coefficients
pcttab_75  <- p75$coefficients
pcttab_90  <- p90$coefficients
pcttab_95  <- p95$coefficients
pcttab_99  <- p99$coefficients

pct_tab <- data.frame(cbind(pcttab_10, pcttab_25, pcttab_50, pcttab_75, pcttab_90, pcttab_95, pcttab_99))
pct_tab$age_group <- c("<30", "30s", "40s", "50s", "60s", "70s+")
pct_tab <- pct_tab[,c(8, 1:7)]
rownames(pct_tab) <- NULL

t_pct_tab <- data.frame(t(pct_tab))

colnames(t_pct_tab) <- t_pct_tab[1,]
t_pct_tab <- t_pct_tab[-1,]

t_pct_tab <- apply(t_pct_tab, 2, function(x) scales::dollar(as.numeric(as.character(x)), accuracy = 1, trim = TRUE))
rownames(t_pct_tab) <- c("10th Percentile", "25th Percentile", "50th Percentile", "75th Percentile", "90th Percentile", "95th Percentile", "99th Percentile")

kable_table <- kable(t_pct_tab, format = "pipe", caption = "Percentile Net Worth Values by Age Group")

kable_table

p10 <- scf_MIcombine( with( scf_design, svyby(~networth, ~age_group, svyquantile,

0.1 , se = TRUE , interval.type = 'quantile')))

p25 <- scf_MIcombine( with( scf_design, svyby(~networth, ~age_group, svyquantile,

0.25 , se = TRUE , interval.type = 'quantile')))

p50 <- scf_MIcombine( with( scf_design, svyby(~networth, ~age_group, svyquantile,

0.5 , se = TRUE , interval.type = 'quantile')))

p75 <- scf_MIcombine( with( scf_design, svyby(~networth, ~age_group, svyquantile,

0.75 , se = TRUE , interval.type = 'quantile')))

p90 <- scf_MIcombine( with( scf_design, svyby(~networth, ~age_group, svyquantile,

0.9 , se = TRUE , interval.type = 'quantile')))

p95 <- scf_MIcombine( with( scf_design, svyby(~networth, ~age_group, svyquantile,

0.95 , se = TRUE , interval.type = 'quantile')))

p99 <- scf_MIcombine( with( scf_design, svyby(~networth, ~age_group, svyquantile,

0.99 , se = TRUE , interval.type = 'quantile')))

pcttab_10 <- p10$coefficients

pcttab_25 <- p25$coefficients

pcttab_50 <- p50$coefficients

pcttab_75 <- p75$coefficients

pcttab_90 <- p90$coefficients

pcttab_95 <- p95$coefficients

pcttab_99 <- p99$coefficients

pct_tab <- data.frame(cbind(pcttab_10, pcttab_25, pcttab_50, pcttab_75, pcttab_90, pcttab_95, pcttab_99))

pct_tab$age_group <- c("<30", "30s", "40s", "50s", "60s", "70s+")

pct_tab <- pct_tab[,c(8, 1:7)]

rownames(pct_tab) <- NULL

t_pct_tab <- data.frame(t(pct_tab))

colnames(t_pct_tab) <- t_pct_tab[1,]

t_pct_tab <- t_pct_tab[-1,]

t_pct_tab <- apply(t_pct_tab, 2, function(x) scales::dollar(as.numeric(as.character(x)), accuracy = 1, trim = TRUE))

rownames(t_pct_tab) <- c("10th Percentile", "25th Percentile", "50th Percentile", "75th Percentile", "90th Percentile", "95th Percentile", "99th Percentile")

kable_table <- kable(t_pct_tab, format = "pipe", caption = "Percentile Net Worth Values by Age Group")

kable_table

	<30	30s	40s	50s	60s	70s+
10th Percentile	-$13,278	-$3,274	$736	$4,086	$5,420	$9,536
25th Percentile	$1,122	$15,552	$31,326	$69,840	$78,254	$99,192
50th Percentile	$21,054	$107,752	$189,746	$306,886	$394,394	$367,860
75th Percentile	$117,796	$323,606	$600,252	$1,074,428	$1,106,748	$1,035,780
90th Percentile	$280,300	$750,544	$1,388,340	$2,711,830	$2,962,016	$2,976,530
95th Percentile	$411,880	$1,223,800	$2,704,490	$5,382,202	$6,639,864	$5,875,320
99th Percentile	$2,128,300	$5,026,540	$7,670,380	$15,530,000	$18,921,600	$18,552,600

First, note that the distribution of household wealth is much more compressed before age 30. Many young people are still in school, which often involves accumulating debt without earning money. Many of the poorer households in this age category will ultimately end up among the wealthier later in life. Households that enter the workforce immediately may get an earlier start accumulating property, but my sense of the data and evidence is that advanced training has much higher returns.

Differences open up over people’s 30s and 40s, and eventually stabilize in the 50s onward. A household’s 30s and 40s may be a period in which they experience the run-up of their career and property accumulation, setting them on the trajectory into which they will settle in the 50s onward. Households that spend their 30s and 40s with the privilege of creating a successful business, ascending quickly and going far in their professional field, getting an early start on their retirement savings, or having successful first forays into real estate. Households accumulate earning power in part through business or professional success, and a growing portfolio of money-making or price-appreciating property.

The distribution of wealth seems to steady when we get beyond people’s fifties. It does not mean that people’s personal fates are static. Some people will run up professional success and savings in their 30s and 40s, and then find themselves pressed out of the workforce when they hit their 50s. Others will have earned more steadily and will be able to stretch out their working years into their 70s. But the general distribution of wealth remains pretty static in the aggregate beyond the 50s. Another reason that wealth may stabilize after the 50s is that people are most likely to receive intergenerationally-transmitted wealth by this decade. These transmissions may also propel the run-up of wealth in people’s 30s and 40s, as the incidence of parental death rises.

I close the analysis with a visualization that resembles the age-weight charts at the pediatrician. It offers a way to look up wealth levels over the life course. I can create a year-by-year one if there’s interest. The first version is on a log scale because we would not be able to see clear differences in non-millionaires when we include multi-millionaires in the figure.

p20 <- scf_MIcombine( with( scf_design, svyby(~networth, ~age_group, svyquantile,  
                                              0.2 , se = TRUE , interval.type = 'quantile')))
p30 <- scf_MIcombine( with( scf_design, svyby(~networth, ~age_group, svyquantile,  
                                              0.3 , se = TRUE , interval.type = 'quantile')))
p40 <- scf_MIcombine( with( scf_design, svyby(~networth, ~age_group, svyquantile,  
                                              0.4 , se = TRUE , interval.type = 'quantile')))
p60 <- scf_MIcombine( with( scf_design, svyby(~networth, ~age_group, svyquantile,  
                                              0.6 , se = TRUE , interval.type = 'quantile')))

p70 <- scf_MIcombine( with( scf_design, svyby(~networth, ~age_group, svyquantile,  
                                              0.7 , se = TRUE , interval.type = 'quantile')))
p80 <- scf_MIcombine( with( scf_design, svyby(~networth, ~age_group, svyquantile,  
                                              0.8 , se = TRUE , interval.type = 'quantile')))


pcttab_20  <- p20$coefficients
pcttab_30  <- p30$coefficients
pcttab_40  <- p40$coefficients
pcttab_60  <- p60$coefficients
pcttab_70  <- p70$coefficients
pcttab_80  <- p80$coefficients

pct_fig <- data.frame(rbind(pcttab_10, pcttab_20, pcttab_30, pcttab_40, pcttab_50, 
                            pcttab_60, pcttab_70, pcttab_80, pcttab_90, pcttab_95,
                            pcttab_99))
names(pct_fig) <- paste(c("Twenties","Thirties","Forties", "Fifties", "Sixties", "Seventies"))
rownames(pct_fig) <- c("P10", "P20", "P30", "P40", "P50", "P60", "P70", "P80", "P90", "P95", "P99")
pct_fig <- round(pct_fig, 0)
pct_table <- pct_fig

pct_long <- pct_fig %>%
  t() %>%
  as.data.frame() %>%
  rownames_to_column("AgeGroup") %>%
  gather(key = "Percentile", value = "NetWorth", -AgeGroup)

# Convert the AgeGroup and Percentile to factors for proper ordering in the plot
pct_long$AgeGroup <- factor(pct_long$AgeGroup, levels = c("Twenties","Thirties","Forties", "Fifties", "Sixties", "Seventies"))
pct_long$Percentile <- factor(pct_long$Percentile, levels = c("P10", "P20", "P30", "P40", "P50", "P60", "P70", "P80", "P90", "P95", "P99"))


# Need log y-scale
pct_long$NetWorth <- ifelse(pct_long$NetWorth <= 0, 1, pct_long$NetWorth)


# Create the plot
library(ggplot2)
library(dplyr)

# Extend the plot space to the right and reduce the label offset
fig1 <- ggplot(pct_long, aes(x = AgeGroup, y = NetWorth, group = Percentile, color = Percentile)) +
  geom_point() +
  geom_line() +
  stat_smooth(se = FALSE, method = 'loess') +
  scale_y_log10(breaks = c(10, 25, 50, 100, 250, 500, 1000, 2500, 5000, 10000, 25000, 50000, 100000, 250000, 500000, 1e6, 2.5e6, 5e6, 1e7, 2.5e7, 5e7, 1e8, 2.5e8, 5e8),
                labels = scales::dollar_format()) +
  labs(title = "Net Worth Percentiles by Age Group, US Households, 2022",
       x = "Age Group",
       y = "Net Worth (Log Scale)",
       color = "Percentile") +
  theme_minimal() +
  theme(legend.position = "none", aspect.ratio = 3/2) +
  scale_x_discrete(limits = c("Twenties", "Thirties", "Forties", "Fifties", "Sixties", "Seventies", ""), expand = c(0.01, 0.8)) +
  geom_text(data = pct_long %>% group_by(Percentile) %>% slice(n()),
            aes(label = paste0(sub("P", "", Percentile), "th percentile"), x = AgeGroup, y = NetWorth), 
            hjust = 0, vjust = 0.5, size = 2, nudge_x = 0.2, check_overlap = TRUE) +
  annotate("text", x = "", y = 3, label = "J. Cohen, Queens College (josephnathancohen.info)", hjust = 1, vjust = 0, size = 3.5, color = "black")

ggsave(fig1, file = "Fig1.png", device = "png")
## Saving 5 x 4 in image
## `geom_smooth()` using formula = 'y ~ x'
fig1
## `geom_smooth()` using formula = 'y ~ x'

p20 <- scf_MIcombine( with( scf_design, svyby(~networth, ~age_group, svyquantile,

0.2 , se = TRUE , interval.type = 'quantile')))

p30 <- scf_MIcombine( with( scf_design, svyby(~networth, ~age_group, svyquantile,

0.3 , se = TRUE , interval.type = 'quantile')))

p40 <- scf_MIcombine( with( scf_design, svyby(~networth, ~age_group, svyquantile,

0.4 , se = TRUE , interval.type = 'quantile')))

p60 <- scf_MIcombine( with( scf_design, svyby(~networth, ~age_group, svyquantile,

0.6 , se = TRUE , interval.type = 'quantile')))

p70 <- scf_MIcombine( with( scf_design, svyby(~networth, ~age_group, svyquantile,

0.7 , se = TRUE , interval.type = 'quantile')))

p80 <- scf_MIcombine( with( scf_design, svyby(~networth, ~age_group, svyquantile,

0.8 , se = TRUE , interval.type = 'quantile')))

pcttab_20 <- p20$coefficients

pcttab_30 <- p30$coefficients

pcttab_40 <- p40$coefficients

pcttab_60 <- p60$coefficients

pcttab_70 <- p70$coefficients

pcttab_80 <- p80$coefficients

pct_fig <- data.frame(rbind(pcttab_10, pcttab_20, pcttab_30, pcttab_40, pcttab_50,

pcttab_60, pcttab_70, pcttab_80, pcttab_90, pcttab_95,

pcttab_99))

names(pct_fig) <- paste(c("Twenties","Thirties","Forties", "Fifties", "Sixties", "Seventies"))

rownames(pct_fig) <- c("P10", "P20", "P30", "P40", "P50", "P60", "P70", "P80", "P90", "P95", "P99")

pct_fig <- round(pct_fig, 0)

pct_table <- pct_fig

pct_long <- pct_fig %>%

t() %>%

as.data.frame() %>%

rownames_to_column("AgeGroup") %>%

gather(key = "Percentile", value = "NetWorth", -AgeGroup)

# Convert the AgeGroup and Percentile to factors for proper ordering in the plot

pct_long$AgeGroup <- factor(pct_long$AgeGroup, levels = c("Twenties","Thirties","Forties", "Fifties", "Sixties", "Seventies"))

pct_long$Percentile <- factor(pct_long$Percentile, levels = c("P10", "P20", "P30", "P40", "P50", "P60", "P70", "P80", "P90", "P95", "P99"))

# Need log y-scale

pct_long$NetWorth <- ifelse(pct_long$NetWorth <= 0, 1, pct_long$NetWorth)

# Create the plot

library(ggplot2)

library(dplyr)

# Extend the plot space to the right and reduce the label offset

fig1 <- ggplot(pct_long, aes(x = AgeGroup, y = NetWorth, group = Percentile, color = Percentile)) +

geom_point() +

geom_line() +

stat_smooth(se = FALSE, method = 'loess') +

scale_y_log10(breaks = c(10, 25, 50, 100, 250, 500, 1000, 2500, 5000, 10000, 25000, 50000, 100000, 250000, 500000, 1e6, 2.5e6, 5e6, 1e7, 2.5e7, 5e7, 1e8, 2.5e8, 5e8),

labels = scales::dollar_format()) +

labs(title = "Net Worth Percentiles by Age Group, US Households, 2022",

x = "Age Group",

y = "Net Worth (Log Scale)",

color = "Percentile") +

theme_minimal() +

theme(legend.position = "none", aspect.ratio = 3/2) +

scale_x_discrete(limits = c("Twenties", "Thirties", "Forties", "Fifties", "Sixties", "Seventies", ""), expand = c(0.01, 0.8)) +

geom_text(data = pct_long %>% group_by(Percentile) %>% slice(n()),

aes(label = paste0(sub("P", "", Percentile), "th percentile"), x = AgeGroup, y = NetWorth),

hjust = 0, vjust = 0.5, size = 2, nudge_x = 0.2, check_overlap = TRUE) +

annotate("text", x = "", y = 3, label = "J. Cohen, Queens College (josephnathancohen.info)", hjust = 1, vjust = 0, size = 3.5, color = "black")

ggsave(fig1, file = "Fig1.png", device = "png")

## Saving 5 x 4 in image

## `geom_smooth()` using formula = 'y ~ x'

fig1

## `geom_smooth()` using formula = 'y ~ x'

Net Worth Percentiles by Age Group. By Joseph Cohen, Dept of Sociology, CUNY Queens College.

Keep the log scale in mind when interpreting the data. Wealth levels get larger in multiplicative rather than additive terms as we move up the y-axis. The differences look pretty evenly spread out on this scale, but this is the figure without the logged axis (showing only up to the 90^th percentile):

pct_long_b <- subset(pct_long, Percentile != "P95" & Percentile != "P99")
ggplot(pct_long_b, aes(x = AgeGroup, y = NetWorth, group = Percentile, color = Percentile)) +
  geom_point() +
  geom_line() +
  stat_smooth(se = FALSE, method = 'loess') +
  scale_y_continuous(labels = scales::dollar_format(), breaks = seq(0, max(pct_long_b$NetWorth, na.rm = TRUE), by = 250000)) +
  labs(title = "Net Worth Percentiles by Age Group, US Households, 2022",
       x = "Age Group",
       y = "Net Worth",
       color = "Percentile") +
  theme_minimal() +
  theme(legend.position = "none",
        aspect.ratio = 7/5,
        plot.title = element_text(hjust = 0),
        plot.margin = margin(6, 6, 20, 6)) +
  scale_x_discrete(limits = c("Twenties", "Thirties", "Forties", "Fifties", "Sixties", "Seventies", ""), expand = c(0.01, 0.8)) +
  geom_text(data = pct_long_b %>% group_by(Percentile) %>% slice(n()),
            aes(label = paste0(sub("P", "", Percentile), "th percentile"), x = AgeGroup, y = NetWorth),
            hjust = 0, vjust = 0.5, size = 3, nudge_x = 0.2, check_overlap = TRUE)

pct_long_b <- subset(pct_long, Percentile != "P95" & Percentile != "P99")

ggplot(pct_long_b, aes(x = AgeGroup, y = NetWorth, group = Percentile, color = Percentile)) +

geom_point() +

geom_line() +

stat_smooth(se = FALSE, method = 'loess') +

scale_y_continuous(labels = scales::dollar_format(), breaks = seq(0, max(pct_long_b$NetWorth, na.rm = TRUE), by = 250000)) +

labs(title = "Net Worth Percentiles by Age Group, US Households, 2022",

x = "Age Group",

y = "Net Worth",

color = "Percentile") +

theme_minimal() +

theme(legend.position = "none",

aspect.ratio = 7/5,

plot.title = element_text(hjust = 0),

plot.margin = margin(6, 6, 20, 6)) +

scale_x_discrete(limits = c("Twenties", "Thirties", "Forties", "Fifties", "Sixties", "Seventies", ""), expand = c(0.01, 0.8)) +

geom_text(data = pct_long_b %>% group_by(Percentile) %>% slice(n()),

aes(label = paste0(sub("P", "", Percentile), "th percentile"), x = AgeGroup, y = NetWorth),

hjust = 0, vjust = 0.5, size = 3, nudge_x = 0.2, check_overlap = TRUE)

The graph gets much taller when we include 95^th and 99^th percentile scores. All of this is to clarify the fact that the run-up of wealth that occurs during people’s working years differs on multiplicative scales.

Joseph Nathan Cohen

Department of Sociology, CUNY Queens College, New York, NY

Net Worth by Age Group

Data and Methods

Analysis

Related

Leave a Reply Cancel reply

Tags

Latest Stuff

Creating Podcast Episode Pages

Fast Transcription with Adobe Premiere

Notes: Analyzing Complex Survey Data

Assignment: Political Messaging Strategy

Notes: Basic Linear Regression

Contact

About this Site