Each offering of this course approaches the study of variability using a particular set of statistical tools (such as Bayesian Analysis, biostatistics, sports analytics, experimental design, or statistical machine learning). Specific statistical methodology within a subfield of the discipline will be examined. One topic from the following list is offered each spring.
Bayesian Analysis
This course will focus on statistical inference using a Bayesian framework. Unlike many other statistical tools, Bayesian methods incorporate prior information with information derived through experimentation or observation. The course will begin with a review of the basic concepts of probability, which are critical to understanding the foundations of the subject. Other topics will include decision theory, loss functions, subjective and objective prior distributions, posterior distribution, estimation, testing, prediction, sensitivity analysis, and hierarchical modeling. We will also compare and contrast Bayesian methods with classical methods. Students will use R regularly to make Bayesian inference from data.
Biostatistics
Statistical methods are often used in medical studies. All of the examples, exercises, and projects will deal with data from public health sectors, the World Health Organization, the CDC, the FDA, or prospective or retrospective studies on patients. Survival functions will be introduced and inference methods based on the Kaplan-Meier estimator will be studied. Cox regression models and accelerated failure time models will be examined if time permits. R statistical software will be used heavily throughout the course.
Sports Analytics
Sports analytics are being used more frequently to help managers and owners make important decisions. Billy Bean was one of the first general managers to implement statistical methods and modelsto MLB. Now, similar models and methods are being used in basketball, football, hockey, soccer, golf, swimming and other sports. Using data science techniques to scrape data from appropriate sources has been a game changer for many analysts who are always trying to get an advantage on their competitors. We will carefully examine the statistical methods that are being used. In addition to analyzing individual and team performance over time, we will look at the impact of rule changes and new guidelines or draft policies. Students will read current journal articles from sports statistics journals and analyze data to address open questions of interest. Oral and written communication about these technical models will be a regular part of the course. Students will regularly be using R to analyze data and make inferences. Statistical methods for analyzing time series data will be a major part of this course.
Experimental Design
This course will focus on the design and analysis of experiments. Complete and fractional factorial designs, completely random designs, randomized complete block designs will be discussed. Students will use R regularly to analyze data from such experiments. Additionally, students will be asked to design an experiment and to collect and analyze the resulting data. Other topics such as split plot designs, Latin hypercube designs, and computational methods for calculating designs will be discussed as time permits.
Statistical Machine Learning
The course provides an introduction to statistical models and algorithms for supervised and unsupervised learning. Topics will include regression, classification, clustering, dimensionality reduction, and feature extraction. Using R, students will regularly implement the machine learning techniques being discussed.