This data set contains seasonal pitching data from the MLB from 2000 to 2019.

Download a Zip Archive of the Data and Script (includes CSV and RDS)

Click here for an explanation of the variables

Wrangling Operation

The operation requires the following packages, particularly Bill Petti’s excellent baseballr package for wrangling MLB data:

# Download the Baseball R Package:
# devtools::install_github(repo = "BillPetti/baseballr")

library(baseballr)
library(rvest)
library(plyr)

First, I create a table of player identifiers from the Chadwick Baseball Bureau using get_chadwick_lu() in baseballr. These identifiers will help users merge this data table to other baseball data.

# Player Identifiers
dat.playerid <- get_chadwick_lu()
dat.playerid <- dat.playerid[c(1:7,13:15,19,25:28)]
saveRDS(dat.playerid, "Player Identifiers.RDS")
identifiers <- readRDS("Player Identifiers.RDS")

Next, I download seasonal MLB pitching performance data from Fangraphs through fg_pitch_leaders()

# Scraping Batting Data
for (i in 2000:2019){
  temp <- fg_pitch_leaders(i, i, league = "all", qual = "n", ind = 1)
  assign(paste0("fg_pitch_", i), temp)
}

As these are all identical versions of the same data table, just representing different years, I can stack them together using rbind():

dat.pit <- fg_pitch_2000
for (i in 2000:2019){
  temp <- get(paste0("fg_pitch_", i))
  temp <- rbind(dat.pit, temp)
  assign("dat.pit", temp)
}

Rename the identifier in the Fangraphs table so that it is the same as the Chadwick Bureau identifer data. I then merge the two sets so that the batting data can more readily be merged with other sources.

names(dat.pit)[1] <- paste("key_fangraphs")

# Clean Up Data Types and Sort
dat.pit$key_fangraphs <- as.numeric(dat.pit$key_fangraphs)
dat.pit$Season <- as.numeric(dat.pit$Season)
dat.pit <- arrange(dat.pit, Name, Season)

# Write data
write.csv(dat.pit, "Fangraphs Pitching Leaders 2000 - 2019.csv")
saveRDS(dat.pit, "Fangraphs Pitching Leaders 2000 - 2019.RDS")

# Clean up memory 
rm(list=ls(pattern = "fg_pitch"))

Photo Credit. By derivative work: Amineshaker (talk)Image:Nolan_Ryan_in_Atlanta.jpg: Wahkeenah – Image:Nolan_Ryan_in_Atlanta.jpg|200px, Public Domain, https://commons.wikimedia.org/w/index.php?curid=5022538

Leave a Reply

Your email address will not be published.