This data set contains seasonal batting data from the MLB from 2000 to 2019.

Download a Zip Archive of the Data and Script (includes CSV and RDS)

Click here for an explanation of the variables

Wrangling Operation

The operation requires the following packages, particularly Bill Petti’s excellent baseballr package for wrangling MLB data:

# Download the Baseball R Package:
# devtools::install_github(repo = "BillPetti/baseballr")

library(baseballr)
library(rvest)
library(plyr)

First, I create a table of player identifiers from the Chadwick Baseball Bureau using get_chadwick_lu() in baseballr. These identifiers will help users merge this data table to other baseball data.

# Player Identifiers
dat.playerid <- get_chadwick_lu()
dat.playerid <- dat.playerid[c(1:7,13:15,19,25:28)]
saveRDS(dat.playerid, "Player Identifiers.RDS")
identifiers <- readRDS("Player Identifiers.RDS")

Next, I download seasonal MLB batting leader tables from Fangraphs through fg_bat_leaders()

# Scraping Batting Data
for (i in 2000:2019){
  temp <- fg_bat_leaders(i, i, league = "all", qual = "n", ind = 1)
  assign(paste0("fg_bat_", i), temp)
}

As these are all identical versions of the same data table, just representing different years, I can stack them together using rbind():

dat.bat <- fg_bat_2000
for (i in 2001:2019){
  temp <- get(paste0("fg_bat_", i))
  temp <- rbind(dat.bat, temp)
  assign("dat.bat", temp)
}

Rename the identifier in the Fangraphs table so that it is the same as the Chadwick Bureau identifer data. I then merge the two sets so that the batting data can more readily be merged with other sources.

names(dat.bat)[1] <- paste("key_fangraphs")

temp <- merge(dat.bat, identifiers, by = "key_fangraphs")

# Clean Up Data Types and Sort
temp$key_fangraphs <- as.numeric(temp$key_fangraphs)
temp$Season <- as.numeric(temp$Season)
temp <- arrange(temp, Name, Season)

# Write data
write.csv(temp, "Fangraphs Batting Leaders 2000 - 2019.csv")
saveRDS(temp, "Fangraphs Batting Leaders 2000 - 2019.RDS")

# Clean up memory 
rm(temp)
rm(list=ls(pattern = "fg_bat"))

Using the Data

To call the data in your analysis

# To load the CSV
dat.bat <- read.csv(paste0(data.directory("Fangraphs Batting Leaders 2000 - 2019.csv"))

# To load the RDS
dat.bat <- readRDS(paste0(data.directory("Fangraphs Batting Leaders 2000 - 2019.RDS"))

Where data.directory is the path to the folder containing the data files.

Leave a Reply

Your email address will not be published.