Department of Sociology, CUNY Queens College, New York, NY

Download All Years of the SCF Locally

Download SCF data to your device using R.

Version 1.10

This script will download all years of the United States Federal Reserve’s Survey of Consumer Finances data set. This post walks through this process with explanations. This script builds on earlier work by Anthony Damico.

Setting Up Your Session

I start every R script with the same block of code. It ensures that your script runs from a clean memory, seeks and writes files to a project-dedicated folder, and sets a random seed so that others can replicate my scripts. A random seed is a list of numbers that an analyst can use when they need random numbers to perform operations. Two analyses that use the same seed will use the same sequence of “random” numbers whenever they perform a analytical operation that requires random numbers, which makes it possible to replicate analyses.

Do not forget to substitute your working directory path in this chunk, and make sure that it has a “Data” subdirectory:

Download the Data

First, we will download the data from the Federal Reserve servers. I am going to download the SAS format data, because the SAS data sets have converted all years’ values to 2022 real dollars. I copied the link addresses of the data sets and tried to discern a pattern in the way that the data files were named. Recall that it is a triennial survey.

Unzip files

The next step is to unzip these files and convert the data files within them from SAS COPY to RDS format. This script will utilize the unzip function to extract the files and then the file.remove function to delete the original zip files after extraction. You’ll need to ensure that the working directory is set to the directory where the zip files are located, which here is the folder whose path is encoded in the object data_directory.

Convert to RDS format.

The next step is convert these individual data tables, which are encoded in Stata format. I will save the new files with a standardized file naming scheme, so as to facilitate iterative operations on the data in the next step.

Data Cleaning and Diagnostics

The next step is to check the data and ensure that it looks as we would expect from the documentation. First off, I am going to change all the variable names to lower-case because I find it easier to work with.

The following code cleans up the data and replicate weights table. Adapted from Anthony Damico‘s scripts with some modifications:

Rescale Weights

Next, we adjust the replicate weights. The replicate weights file has missing values, and the documentation states that observations are intended to be left out of the analysis when these values are missing. As such, we set missing replicate weights to zero. The weights are then scaled by a factor that accounts for the number of times each case appears in the sample replicates. This procedure is known as “replicate weight adjustment”.

Leave a Reply

Your email address will not be published. Required fields are marked *