Course Schedule

Part I - Using R and RStudio

An Introduction to R

Modules

Topics

  • History of R
    • Relation to other languages and statistics software
  • Installing R and RStudio
  • Using R and RStudio through the cloud with Posit
  • Setting up your RStudio workspace
    • Panels: Source, Console, Environment/History, Other Tabs
      • Configuration and customization
    • Setting the working directory
    • Saving workspaces
  • R documentation and getting help
  • R Basics
    • Using R interactively
    • Variables and assignment
    • Packages
      • Installing and updating
      • Dependencies
    • R objects
      • Object types - Vectors, simple functions, and environments
      • Classes and attributes of objects
      • Scripting and sourcing scripts

Required Readings

  • Introduction to Data Science
    • Chapter 1 - Introduction
    • Chapter 2 - R Basics
  • R in Action, Second Edition
    • Chapter 1 - Getting Started
    • Chapter 2 - Creating a Dataset

Other Useful Readings

  • The Book of R
    • Chapter 1 - Getting Started
    • Chapter 2 - Numerics, Arithmetic, Assignment, and Vectors
  • R Programming for Data Science
    • Chapter 3 - History and Overview of R
    • Chapter 5 - R Nuts and Bolts
  • Statistics: An Introduction Using R
    • Chapter 1 - Fundamentals
    • Appendix: Essentials of the R Language
  • Advanced R, First Edition
    • Chapter 2 - Data Structures
  • Modern Data Science with R
    • Appendix B: An Introduction to R and RStudio

Version Control and Reproducibility

Modules

Topics

  • Good programming practices
    • Version control with git and GitHub
    • Data workflow with R projects using local and remote repositories
    • Reproducible research using Quarto or RMarkdown documents and notebooks
    • Programming conventions and style

Required Readings

  • Introduction to Data Science
    • Chapter 39 - Git and GitHub
  • Essentials of Data Science
    • Chapter 11 - R with Style

Other Useful Readings

  • Happy Git and GitHub for the useR
  • Introduction to Data Science
    • Chapter 37 - Accessing the terminal and installing Git
    • Chapter 38 - Organizing with Unix
    • Chapter 40 - Reproducible projects with RStudio and Quarto/RMarkdown

Data Science Preliminaries

Modules

Topics

  • Working with data
    • The Tao of text
    • More object types - matrices, n-dimensional arrays, lists, data frames, and other tabular structures (e.g., data tables and “tibbles”)
    • Subsetting and filtering data structures
      • Single bracket ([]) notation
      • Double bracket ([[]]) notation
      • $ notation
    • Factors
    • Class coercion and conversion
    • Special data values - NA, NaN, Inf
    • Getting data in and out of R
      • From “.csv” files - {readr}
      • From Excel - {readxl} and others
      • From Dropbox, Box, and other cloud file storage
      • From other online resources - {curl}
      • From databases - {RMySQL}, {RSQLite}, {RPostgreSQL} and others

Required Readings

  • The Book of R
    • Chapter 3 - Matrices and Arrays
    • Chapter 5 - Lists and Data Frames
  • R in Action
    • Chapter 4 - Basic Data Management

Other Useful Readings

  • The Book of R
    • Chapter 4 - Non-Numeric Values
    • Chapter 6 - Special Values, Classes, and Coercion
    • Chapter 8 - Reading and Writing Files
  • Advanced R
    • Chapter 4 - Subsetting
  • R for Data Science
    • Chapter 7 - Data Import

Exploratory Data Analysis

Modules

Topics

  • Summarizing and visualizing data
    • Basic descriptive statistics
    • Tidying and reshaping data with {tidyr}
    • Simple plotting (boxplots, histograms, scatterplots) with {base} R, {ggplot2}, and others

Required Readings

  • Introduction to Data Science
    • Chapter 5 - The {tidyverse}
  • R in Action
    • Chapter 6 - Basic Graphs
    • Chapter 7 - Basic Statistics

Other Useful Readings

  • The Book of R
    • Chapter 13 - Elementary Statistics
    • Chapter 14 - Basic Data Visualization
  • R for Data Science
    • Chapter 5 - Data Tidying

Data Wrangling and Programming

Modules

Topics

  • Manipulating data
    • {dplyr} functions - select(), filter(), arrange(), rename(), mutate(), group_by(), summarize()
    • Chaining and piping data with pipe operators (e.g., |>, &>%)
  • R programming practices
    • Writing functions
      • Argument lists
      • Default values
    • Program flow control
      • Conditional statements (e.g., if () { } else { })
      • for() loops
      • while() loops

Required Readings

  • Introduction to Data Science
    • Chapter 4 - Programming Basics

Other Useful Readings

  • The Book of R
    • Chapter 9 - Calling Functions
    • Chapter 10 - Conditions and Loops
    • Chapter 11 - Writing Functions
  • R for Data Science
    • Chapter 10 - Relational Data with {dplyr}
  • R in Action
    • Chapter 5 - Advanced Data Management

Part II - Statistics and Inference

Beginning Statistics

Modules

Topics

  • Populations and samples, parameters and statistics
  • Describing central tendency, spread, and skew
  • Standard errors and quantiles

Required Readings

  • Modern Data Science with R
    • Chapter 7 - Statistical Foundations
  • Introduction to Data Science
    • Chapter 15 - Random Variables
  • Statistical Inference via Data Science
    • Chapter 7 - Sampling

Other Useful Readings

  • Statistics: An Introduction Using R
    • Chapter 3 - Central Tendency
    • Chapter 4 - Variance

Probability and Distributions

Modules

Topics

  • Probability and conditional probability
  • Random variables - discrete and continuous
  • Probability mass functions, probability density functions
  • Cumulative probability function
  • Some useful distributions and their properties
    • Distribution functions
      • Density (d)
      • Cumulative probability (p)
      • Quantile (q)
      • Random (r)
    • Discrete distributions
      • Bernoulli
      • Poisson
      • Binomial
    • Continuous distributions
      • Beta
      • Uniform
      • Normal
  • Q-Q plots

Required Readings

  • Introduction to Data Science
    • Chapter 14 - Probability
  • The Book of R
    • Chapter 15 - Probability
    • Chapter 16 - Common Probability Distributions

Confidence Intervals

Modules

Topics

  • Standard errors and confidence intervals
    • CIs based on a theoretical distribution
    • The Central Limit Theorem
    • CIs based on bootstrapping
    • CIs for proportions

Required Readings

  • Introduction to Data Science
    • Chapter 16 - Statistical Inference
  • The Book of R
    • Chapter 17 - Sampling Distributions and Confidence

Other Useful Readings

  • R Programming for Data Science
    • Chapter 22 - Simulation
  • Statistical Inference via Data Science
    • Chapter 8 - Bootstrapping and Confidence Intervals

Hypothesis Testing

Modules

Topics

  • p values and “significance”
  • Classic null hypothesis significance testing (NHST)
    • One- and two-sample \(T\) and \(Z\) tests
  • Permutation and randomization tests
  • Type I and Type II error
  • Statistical power, effect sizes

Required Readings

  • The Book of R
    • Chapter 18 - Hypothesis Testing
  • Statistical Inference via Data Science
    • Chapter 9 - Hypothesis Testing
  • Legendre & Legendre (2012). Chapter 1.2. Statistical testing by permutation. Numerical Ecology, 3rd Edition. Elsevier.

Other Useful Readings

  • Statistics Done Wrong
    • Chapter 1 - An Introduction to Statistical Significance
  • Statistics: An Introduction Using R
    • Chapter 5 - Single Samples
    • Chapter 6 - Two Samples

Relevant Data Camp Material

  • Foundations of Inference - Introduction to Ideas of Inference
  • Foundations of Inference - Confidence Intervals
  • Foundations of Inference - Completing a Randomization Test
  • Foundations of Inference - Hypothesis Testing Errors

Introduction to Linear Modeling

Modules

Topics

  • Correlation and covariation
  • Basic linear modeling
    • Continuous random predictor and response variables
    • Simple linear regression (1 predictor and 1 response variable)
    • Estimating and interpreting regression coefficients
    • Model I versus Model II regression
    • The lm() function

Required Readings

  • The Book of R
    • Chapter 20 - Simple Linear Regression

Other Useful Readings

  • Statistics: An Introduction Using R
    • Chapter 7 - Regression
  • Statistical Inference via Data Science
    • Chapter 5 - Basic Regression

Relevant Data Camp Material

  • Correlation and Regression in R - Visualizing Two Variables
  • Correlation and Regression in R - Correlation
  • Correlation and Regression in R - Simple Linear Regression
  • Correlation and Regression in R - Interpreting Regression Models

Elements of Regression Analysis

Modules

Topics

  • Inference in regression
    • Estimating standard errors for regression cofficients
    • Confidence intervals and prediction intervals
    • Residuals
  • Model checking
  • Partitioning of variance in linear models
  • Data transformations

Required Readings

  • Statistical Inference via Data Science
    • Chapter 10 - Inference for Regression

Other Useful Readings

  • Gotelli, N.J. & Ellison, A.M. (2012). Chapter 9. Regression. A Primer of Ecological Statistics, 2nd Edition. Sinauer Associates, Inc.

Extending Linear Regression

Modules

Topics

  • Regression with categorical predictors
    • One- and multiple-factor ANOVA
    • Type I, Type II, Type III sums of squares
    • Interaction plots to visualize changes across groups
  • Simple categorical data analysis
    • Kruskall-Wallis tests
    • Chi-Square tests of goodness-of-fit and independence
  • Generating mock data with a defined correlation structure
  • Regression with multiple predictors
    • More than one continuous predictor
    • Combinations of continuous and categorical predictors
    • Visualizing linear models with more than one predictor
    • Confidence intervals and prediction in multiple regression
    • Interactions between predictors
    • Interaction plots to visualize changes across groups

Required Readings

  • The Book of R
    • Chapter 19 - Analysis of Variance
    • Chapter 21 - Multiple Linear Regression

Other Useful Readings

  • Gotelli, N.J. & Ellison, A.M. (2012). Chapter 10. The analysis of variance. A Primer of Ecological Statistics, 2nd Edition. Sunderland, Sinauer Associates, Inc.
  • Statistics: An Introduction Using R
    • Chapter 8 - Analysis of Variance
    • Chapter 9 - Analysis of Covariance

Model Selection

Modules

Topics

  • Model simplification and selection
    • Partial F tests for comparing models
    • Forward and backward selection
    • Information criteria considerations for comparing models
  • The Akaike Information Criterion (AIC) and others
    • {stats} step()
    • {MASS} stepwise()
    • {AICcmodavg}

Required Readings

  • The Book of R
    • Chapter 22 - Linear Model Selection and Diagnostics

Other Useful Readings

  • Package descriptions for {AICcmodavg} and {MuMIn}

Relevant Data Camp Material

  • Correlation and Regression in R - Model Fit

Linear and Mixed Effects Modeling

Modules

Topics

  • Generalized linear models
    • Other response variables types (e.g., counts, binary responses)
    • Logistic regression, multiple logistic regression
    • Log-linear modeling
    • Likelihood ratio tests
  • Introduction to mixed effects modeling
    • Combining fixed and random factors
  • Assessing model fit for GLMs and mixed models

Required Readings

  • R in Action
    • Chapter 13 - Generalized Linear Models
  • Bolker, B.M., Brooks, M.E., Clark, C.J., Geange, S.W., Poulsen, J.R., Stevens, M.H.H., White, & J.-S.S. (2008) Generalized linear mixed models: A practical guide for ecology and evolution. Trends in Ecology and Evolution 24: 127-135.

Other Useful Readings

  • Statistics: An Introduction Using R
    • Chapter 12 - Other Response Variables
  • Bolker, B.M. (2008). Chapter 9. Standard statistics revisited. In: Ecological Models and Data in R. Princeton, NJ: Princeton University Press.
  • Quinn, G.P. & Keough, M.J. (2002). Chapter 13. Generalized linear models and logistic regression. Experimental Design and Data Analysis for Biologists. Cambridge, UK: Cambridge University Press.