Joseph Nathan Cohen

Department of Sociology, CUNY Queens College, New York, NY

Analytics Assignment: Creating Empirical Baseball Knowledge

This assignment asks a student to create an original item of baseball knwoledge.


In this assignment, students will write a 1500 – 3000 word article that creates and conveys original empirical information using baseball data acquired through the baseballr package in the R software platform.

Students can choose to create information on any topic or combination of topics related to data delivered through baseballr. The only conditions are (1) that it creates useful information for a definable audience, (2) that it is built on original empirical data analysis, and (3) that it conforms to quality standards described in the General Assignment Guidelines and discussed throughout the semester.

Your report should follow this outline:

  • Abstract: In 50 words or less, summarize the entirety of the article
  • Introduction: Begin with a very direct and concise strategy of your analysis, its findings, and the insights or uses of these findings. Begin your report by summarize it in its entirety in the first paragraph. In the second paragraph, provide an outline and brief preview of the paper. Optionally, you can try to begin this section with a lead paragraph that hooks or set up the piece. (approx. 150 words)
  • Background: Give the reader background information to understand the analysis and its uses. You might consider discussing topics like: Who will use this piece of information? How will they use this information? Which beliefs, decisions, or practical actions does your analysis probe or inform? What have other people said on this topic, and how might your analysis contribute to the discussion? What theoretical concepts or relationships are implicit in the beliefs that you engage? These types of questions help your audience understand the meaning and stakes of your project.
  • Data and Methods: Which data did you use? Whom or what does the data measure? Which variables are the focus of your analysis? What are these variables supposed to measure? What information are you trying to extract from this data? What is the appropriate operation for performing this analysis, given your goals and data? Should the audience be aware of issues with the data or analysis that would influence efforts to make sense of and replicate the analysis?
  • Analytical Results: This section provides key information extracted from the data, so that the audience can see the details of your findings and lines of reasoning. Topics here might include univariate statistics, bivariate statistics, inferential statistics, data visualizations, regression models, multivariate analyses, data reduction operations, and other topics taught in this class and throughout the Analytics curriculum.
  • Conclusion: Summarize the empirical findings and their meaning. Help the reader understand everything that they saw over the course of your research report, and what conclusions they should take away from this read. This is a place where you can discuss future directions or some other closing item.


Please submit to Blackboard in a zip file with your last name and “333-1” in its name (e.g., “333-1”):

  • Your Markdown file
  • A copy of the data that you used
  • A Word document with a polished version of your report.

All submissions should adhere to the class’s General Assignment Guidelines.

Due date is Wednesday, March 27, 2024

Learning Goals

  • To execute an basic analysis in R statistical platform
  • To generate a research report using R Markdown
  • To create a visualization using the very well-regarded ggplot2 package
  • To practice explaining the underlying concept, empirical measurement, empirical distribution, and practical application of analytic metrics


