STA303: Methods of Data Analysis II
1
How to use this course guide
1.1
Communication policy reminder
1.2
Intellectual property statement
1.2.1
Tutoring companies
1.3
Contributors
2
Syllabus
2.1
Course information
2.1.1
Course description
2.1.2
Class times
2.1.3
Materials
2.1.4
Teaching team
2.2
Land acknowledgment
2.3
Prerequisites
2.4
Course format and organization
2.5
Learning objectives
2.6
Textbooks
2.7
Computing and minimum technical requirements
2.8
Course outline
2.9
Assessments
2.9.1
Knowledge basket
2.10
Hours expectations
2.11
Marking concerns / regrade requests
2.12
Missed work policies
2.12.1
Exceptions
2.12.2
IMPORTANT NOTES
2.13
Communication policy
2.14
Accommodations and accessibility
2.14.1
Accessibility services
2.14.2
Religious Accommodation
2.15
Recognized study groups
2.16
Meet to complete
2.17
Feeling distressed?
2.18
Intellectual property statement
2.19
Academic integrity
2.19.1
Plagiarism
2.19.2
Specific advice on untimed assessments
2.19.3
Rules for timed assessments (e.g., mixed assessments)
2.19.4
NOTE: BE CAREFUL ABOUT PRIVATE TUTORING COMPANIES
2.20
Course design principles
2.20.1
Humans learn better ‘little and often’ but everyone is burnout from two years of a pandemic
2.20.2
Writing is good for statisticians
2.20.3
Course content is Accessible
3
Start here!
3.1
Introductions
3.1.1
Professor Liza Bolton, Instructor
3.1.2
Amin Banihashemi, Head TA
3.2
A few things to know upfront
3.2.1
Students joining off the waitlist
3.3
How this course works
3.4
Hours expectations
3.4.1
Communication
3.5
To do now
Assessments overview
4
Assessment overview
4.1
Graduate student modification (1002H)
5
Mini-portfolio
5.1
General instructions
5.1.1
Template
5.2
Submission instructions
5.3
Cover page
5.4
Introduction
5.5
Statistical skills sample
5.5.1
Task 1: Setting up libraries
5.5.2
Task 2: Visualizing the variance of a Binomial random variable for varying proportions
5.5.3
Task 3: Demonstrating frequentist confidence intervals as long-run probabilities of capturing a population parameter
5.5.4
Task 4: Investigating whether there is an association between cGPA and STA303/1002 students correctly answering a question on global poverty rates
5.6
Writing sample
5.7
Reflection
5.8
Rubric
6
Portfolio
6.1
General instructions
6.1.1
Template
6.2
Submission instructions
6.3
Cover page
6.4
Introduction
6.5
Statistical skills sample
6.5.1
A note on time management.
6.5.2
Task 1: Setting up libraries and seed value
6.5.3
Task 2a: Return to Statdew Valley: exploring sources of variance in a balanced experimental design (teaching and learning world)
6.5.4
Task 2b: Applying linear mixed models for the strawberry data (practical world)
6.5.5
Task 3a: Building a confidence interval interpreter
6.5.6
Task 3b: Building a p value interpreter
6.5.7
Task 3c: User instructions and disclaimer
6.5.8
Task 4: Creating a reproducible example (reprex)
6.5.9
Task 5: Simulating p values
6.6
Writing sample
6.6.1
Prompt
6.7
Reflection
6.8
Rubric
7
Mini-mixed assessment
7.0.1
Instructions
8
Mixed assessment
8.0.1
Instructions
8.1
How your grade is calculated
8.2
Getting help
8.2.1
Academic integrity
8.3
Untimed component
8.4
Timed components
9
Final project
9.0.1
Instructions
10
Knowledge basket overview
10.0.1
Planning and tips
10.0.2
Examples
11
Knowledge basket: Writing and peer feedback
11.1
General instructions
11.1.1
Create phase
11.1.2
Assess phase
11.1.3
Reflect phase
11.1.4
General instructions
11.2
Module 1 writing task
11.2.1
M1 Instructions
11.3
Module 2 writing task
11.3.1
M2 Instructions
11.4
Module 3 writing task
11.4.1
M3 instructions
11.5
Module 4 writing task
11.5.1
M4 instructions
11.5.2
M4 Rubric
11.5.3
M4 Reference
11.6
Module 5 writing task
11.6.1
M5 instructions
11.6.2
M5 Rubric
11.6.3
M5 References
12
Knowledge basket: Professional development task
12.1
Professional development proposal
12.1.1
Instructions
12.1.2
Submission requirements
12.1.3
Rubric
12.1.4
Checklist
12.1.5
Things to keep in minds as you start working towards your professional development goal
12.1.6
Example smart goal
12.2
Professional development evidence and reflection
12.2.1
Templates
12.2.2
Reflection and evidence components
12.2.3
Recommended structure
12.2.4
Rubric
13
Knowledge basket: Other
13.1
‘Getting to know you’ survey
13.1.1
Instructions
13.2
Pre-knowledge check
13.2.1
Instructions
13.2.2
Workshop attendance
13.3
Writing workshop
13.3.1
Workshop attendance
13.4
5 Ways to Well-being workshop attendance and reflection
13.4.1
5WtW information
13.4.2
Workshop attendance
13.5
Academic resilience workshop attendance and reflection
13.5.1
Workshop attendance
13.6
Module check-ins
13.6.1
Instructions
13.7
Team Up! activities
13.8
Graduate school info session
13.9
Punctuation art
13.9.1
Instructions
13.10
Breathing exercise
13.10.1
Instructions
13.11
Linear Mixed Models study guide
13.11.1
Instructions
13.12
Generalized Linear Models study guide
13.12.1
Instructions
13.13
Hack your class workshop attendance and reflection
13.13.1
Instructions
13.14
Sports analytics workshop attendance and reflection
13.14.1
Instructions
Modules
14
Module 1
14.1
Learning checklist
14.2
Instructor information
14.3
Upward management tips
14.3.1
Communicate using the tools your manager prefers
14.3.2
Write good emails (when emails are appropriate)
14.3.3
Understand your manager’s goals
14.3.4
Demonstrate self-management and resilience while also asking for help and flagging problems early.
14.4
Recap of linear models
14.4.1
Why model?
14.4.2
Linear models
14.4.3
Linear regression assumptions
14.4.4
What makes it a
linear
model?
14.4.5
Optional refresher reading
14.5
The data for module 1
14.5.1
Let’s meet the penguins
14.5.2
The variables
14.6
Common statistical tests as linear regression
14.6.1
Introduction
14.6.2
One-sample t-test
14.6.3
Dummy variables
14.6.4
Two means
14.6.5
ANOVA
14.6.6
Credits
14.7
Reproducible examples (reprex)
14.7.1
What is a reproducible example?
14.7.2
Why should you care about creating reproducible examples?
14.7.3
Watch the creator of the reprex package explain it
14.7.4
Using reprexes on Piazza
14.8
Reading strategy: previewing and skimming
14.9
Getting ahead on Module 2
15
Module 2
15.1
Learning checklist
15.1.1
Key functions
15.2
Introduction to Module 2
15.3
Readings for this module
15.3.1
R for Data Science (+ story time)
15.3.2
Common misconceptions about data analysis and statistics
15.3.3
To predict and serve?
15.3.4
Science isn’t broken
15.4
Data wrangling and visualization
15.4.1
Tidy data
15.4.2
Student grades case study
15.5
Ethical professional practice for statisticians
15.5.1
Why should statisticians be ethical?
15.5.2
Getting data
15.5.3
Analyzing data
15.5.4
Making decisions with data
15.6
Statistical communication
15.6.1
Prioritize statistical communication as part of your toolbox
15.6.2
What do statisticians usually
write
?
15.6.3
Common report components
15.6.4
Paraphrasing
16
Module 3
16.1
Learning checklist
16.2
Introduction
16.2.1
How deep are we going on likelihoods?
16.2.2
Model comparison more generally
16.3
Correlated data
16.3.1
Reading
16.3.2
Key vocabulary
16.4
Statdew Valley interactive
16.4.1
Welcome to Statdew Valley!
16.4.2
Optional: Create your 16 bit character
16.4.3
Task 1: Tomatoes (part 1)
16.4.4
Task 1: Tomatoes (part 2)
16.4.5
Task 2: Life is sweet as honey
16.4.6
Where to next?
16.4.7
Credits
16.5
Vocal pitch case study: Part 1 (LMMs)
16.5.1
Motivation
16.5.2
Design of the experiment
16.5.3
Read in the data and explore it
16.5.4
Recall: Linear regression assumptions
16.6
Linear mixed models
16.6.1
Thoughts on plots for hierarchical/correlated data generally
16.6.2
Assumptions
16.6.3
Our model set up
16.6.4
What can correlated errors look like?
16.6.5
How do we tell R which situation we’re in?
16.6.6
Additional considerations
16.7
Vocal pitch case study: Part 2
16.7.1
Modelling individual means with random intercepts
16.7.2
Scenario random slopes
16.8
Interactions between random effects and fixed effects
16.8.1
Model formula
16.8.2
How would we fit this with
lmer
?
16.9
More fixed vs random effects practice
16.9.1
Answers
16.10
Advice for polishing your writing
17
Module 4
17.1
Learning checklist
17.2
Introduction
17.3
Distributions (recap)
17.3.1
Sections to preview and skim
17.3.2
Reading guide
17.3.3
Cheat sheet template
17.4
Ontario COVID hospitalizations
17.4.1
Creating tables in R
17.4.2
Creating tables
17.4.3
Calculations with tables
17.4.4
Joint probabilities
17.4.5
Marginal probabilities
17.4.6
Conditional probabilities
17.4.7
Risk and odds
17.4.8
Hospitalization risk and odds
17.4.9
Odds ratio and risk ratios
17.4.10
When do we use RR vs OR?
17.5
Generalized linear models (GLMs)
17.5.1
Assumptions of the Generalized Linear Model
17.5.2
Components of a Generalized Linear Model
17.5.3
Generalized Linear Models
17.5.4
Ordinary Least Squares again
17.6
Binomial (or logistic) regression
17.6.1
Case study: Challenger disaster
17.7
GLMS: A unifying theory
17.7.1
Optional accompanying video
17.7.2
Readings
17.7.3
Recall
17.7.4
Exponential family forms
17.7.5
Canonical link functions
17.7.6
Exponential Family of Distributions
17.7.7
Iteratively re-weighted least squares algorithm (a sketch)
17.7.8
A quick note on the large sample distribution of
\(\hat{\beta}\)
17.7.9
Deviance
17.7.10
Reading: Logistic regression
17.7.11
Logistic regression case study: Trying to lose weight
17.8
Poisson regression key concepts + reading
17.8.1
Reading
17.8.2
Key concepts for Poisson regression
17.8.3
Poisson regression case studies: (1) Household size in the Philippines and (2) Campus crime
17.8.4
Access the code for the case studies (optional)
17.9
Extra for the curious (NOT assessed)
17.9.1
Efficient maximization (for your reference only)
17.9.2
Numerical maximizers (for your reference only)
17.9.3
Automatic differentiation (for your reference only)
18
Module 5
18.1
Introduction
18.2
GLMMs
18.2.1
What were mixed effects, again?
18.2.2
Pros and cons of generalized linear mixed models
18.2.3
Assumptions
18.2.4
Example: Bacteria in blood samples 🦠💉
18.2.5
Inference for GLMMs
18.2.6
Problems with Likelihood that affect our inference for GLMMs
18.2.7
Back to the bacteria
18.2.8
Generalized Linear Mixed Models, more generally
18.2.9
Some key conclusions
18.2.10
GLMM Reading
18.3
Case control studies and conditional logistic regression (OPTIONAL)
18.4
GAMs
18.4.1
A one tweet GAM lesson
18.4.2
Some fake data
18.4.3
Linear model?
18.4.4
How do we get the wiggles?
18.4.5
Splines
18.4.6
Picturing basis functions
18.4.7
Wiggle, wiggle, wiggle
18.4.8
Choices to make
18.4.9
Generalized additive (mixed) models
18.4.10
Random effects
18.4.11
Case studies: Cherry trees and Portugese larks
18.4.12
Further comments on GAMs
18.4.13
Conclusions
References
Appendix
19
Resources
19.1
Course tools overview
19.1.1
Admin
19.2
Using RStudio with the JupyterHub
19.3
Team Up!
19.3.1
To participate in these activities, you
need
:
19.3.2
Before
the activity you should:
19.3.3
Some rules/logistics
19.3.4
Team Up! Instructions
19.4
Zoom, Zoom, Zoom, Zoom…
19.4.1
Make sure your Zoom is up to date
19.4.2
Customize!
19.4.3
VPN
19.4.4
Notes:
19.4.5
Changing your profile picture on Zoom and Quercus
19.4.6
What to do if you experience technical difficulties during class?
19.4.7
What to do if your instructor or TA is experiencing technical difficulties on Zoom
19.5
Student support services and resources
19.5.1
Mental health support
19.5.2
General University resources
19.5.3
Financial support
19.5.4
Arts & Science COVID19 FAQ
20
FAQs and Errata
20.1
Frequently asked questions
20.1.1
Course admin
20.1.2
Assessments FAQ
20.1.3
Team Up!
20.2
Other
20.2.1
References
20.3
Errata
21
Bits and pieces
21.1
Code to generate course art
21.2
M1 supporting information on matrices (not assessed)
21.2.1
Background
21.2.2
Example
21.2.3
Regression NOT classic, actually ANOVA! (no intercept)
21.2.4
Further reading (if you want it)
21.3
p
values (recap)
21.3.1
What if you get a value that is exactly one of these thresholds?
22
Announcements summary
22.0.1
Looking for student representatives for STA303
Department of Statistical Sciences, The University of Toronto
R Notebook
References