STA414 / STA2104 Winter 2022

Statistical Methods for Machine Learning II

The language of probability allows us to coherently and automatically account for uncertainty. This course will teach you how to build, fit, and do inference in probabilistic models. These models let us generate novel images and text, find meaningful latent representations of data, take advantage of large unlabeled datasets, and even let us do analogical reasoning automatically. This course will teach the basic building blocks of these models and the computational tools needed to use them.

What you will learn:

Standard statistical learning algorithms, when to use them, and their limitations.
The main elements of probabilistic models (distributions, expectations, latent variables, neural networks) and how to combine them.
Standard computational tools (Monte Carlo, Stochastic optimization, regularization, automatic differentiation).

Instructors:

David Duvenaud, Office: 384 Pratt
- Email: duvenaud@cs.toronto.edu (put “STA414” in the subject)
- Office hours: Fridays 1-2pm on zoom
Michal Malyska,
- Email: michal.malyska@mail.utoronto.ca (put “STA414” in the subject)
- Office hours: Thursdays 6-7pm on zoom or by appointment
The two instructors will sometimes teach each other’s sections.

Syllabus

Missed Assessment Form

Piazza

Teaching Assistants:

Siyue Yang
Zhibo Zhang
Jacob Kelly
Mufan Li
Lu Yu
TA Email: sta414tas@cs.toronto.edu

Location:

Online for now. Zoom link will be sent by Quercus.

Reading

No required textbooks.

(PRML) Christopher M. Bishop (2006) Pattern Recognition and Machine Learning
(DL) Ian Goodfellow, Yoshua Bengio and Aaron Courville (2016), Deep Learning
(MLPP) Kevin P. Murphy (2013), Machine Learning: A Probabilistic Perspective
(ESL) Trevor Hastie, Robert Tibshirani, Jerome Friedman (2009) The Elements of Statistical Learning
(ISL) Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani (2017) Introduction to Statistical Learning
(ITIL) David MacKay (2003) Information Theory, Inference, and Learning Algorithms

Tentative Schedule

Week 1 - Jan 10th & 11th - Course Overview and Graphical Model Notation

Coverage:

Class Intro
Topics covered
Quick review of probabilistic models
Graphical model notation: going from graphs to factorized joint probs and back

Materials:

GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models

Week 2 - Jan 17th & 18th - Decision Theory and Parametrizing Probabilistic Models

Learning Outcomes:

Basic decision theory
Understand basics of Directed Graphical Models
Become comfortable working with conditional probabilities

Coverage:

Decision Theory
Conditional Probability Tables
Numbers of parameters in different tables
Plate notation.
Examples of meaningful graphical models
D-Separation, conditional independence in directed models
Bayes Ball algorithm for determining conditional independence

Materials:

Lecture recordings:

Monday Section
Tuesday Section (the better of the two) Part 1 Part 2

Lecture notes:

Tutorial:

Worked examples of decision theory
Worked examples of Directed Graphical Models

Helpful materials:

Pieter Abbeel going over the Bayes Ball algorithm

Week 3 - Jan 24th & 25th - Latent variables and Exact Inference

Assignment 1 released.

Learning Outcomes:

How to write the joint factorization implied by UGMs
How to reason about conditional indepdencies in UGMs
How to do exact inference in joint distributions over discrete variables
The time complexity of exact inference

Materials:

Lecture:

Video recording on Monday section
Video recording on Tuesday section
Undirected Graphical Models Slides notes
Exact Inference Slides notes

Tutorial:

Worked example of MLE for binary models
More examples of UGMs
Worked examples of variable elimination in chains and trees
Simple code for neural networks

Week 4 - Jan 31st & Feb 1st - Message Passing + Sampling

Assignment 1 due Feb 6th.

Learning Outcomes:

Understand Trueskill model
Understand the Message Passing algorithm on trees
Understand Loopy Belief Propagation
Understand Sampling methods and why we need them (Simple MC, Importance, Rejection, Ancestral)

Materials:

Draft lecture notes on message passing Annotated
Draft lecture notes on sampling Annotated
Worked example of message passing on a tree
Worked example of 2 messages passin in a loopy BP on a simple graph
Tutorial worksheet raw
Tutorial worksheet converted to equations, may have some typos
Tuesday Lecture Recording
Coverage:
1. Trueskill Model
2. Message Passing
3. Loopy BP
4. Sampling

Week 5 - Feb 7th & 8th - MCMC

Assignment 2 released Feb 7th.
Coverage:

Metropolis Hastings
Hamiltonian Monte Carlo
Gibbs Sampling
MCMC Diagnostics

Materials:

MCMC intro slides draft Annotated
MCMC Sampling and diagnostics draft Annotated
Tuesday Lecture recording

Week 6 - Feb 14th & 15th - Variational Inference

Assignment 2 due Feb 20th.

Materials:

Learning Outcomes:

Optimizing distributions
Optimizing expectations with simple Monte Carlo
The reparameterization trick
The evidence lower bound

Tutorial:

Hidden Markov Models
Extra readings:
- Machine Learning Trick of the Day (4): Reparameterisation Tricks
- Machine Learning Trick of the Day (5): Log Derivative Trick
- Schulman, John, et al. Gradient estimation using stochastic computation graphs. Advances in Neural Information Processing Systems. 2015.

Week 7 - Feb 21st & 22nd - No classes - Reading week

Enjoy!

Week 8 - Feb 28th & Feb 29th - Midterm

Week 9 - March 7th & 8th - Language Models and attention

Assignment 3 released.

Coverage

NLP Basics
Embeddings from BoW to Skip-Gram
Tasks in modern Natural Language Processing
Autoregressive models

Materials

Readings

Week 10 - March 14th & 15th - Amortized inference and Variational Autoencoders

Coverage:

Amortized inference and Variational Autoencoders

Week 11 - March 21st & 22nd - Kernel methods and Gaussian Processes

Week 12 - March 28th & 29th - Neural Networks

Assignment 4 due April 3rd

Content

Neural Networks intro
Building Blocks of NNs
Common Architectures
Attention and Transformers

STA414 / STA2104 Winter 2022

Statistical Methods for Machine Learning II

What you will learn:

Instructors:

Teaching Assistants:

Location:

Reading

Tentative Schedule

Week 1 - Jan 10th & 11th - Course Overview and Graphical Model Notation

Coverage:

Materials:

Related links:

Week 2 - Jan 17th & 18th - Decision Theory and Parametrizing Probabilistic Models

Learning Outcomes:

Coverage:

Materials:

Week 3 - Jan 24th & 25th - Latent variables and Exact Inference

Learning Outcomes:

Materials:

Week 4 - Jan 31st & Feb 1st - Message Passing + Sampling

Learning Outcomes:

Materials:

Coverage:

Week 5 - Feb 7th & 8th - MCMC

Coverage:

Materials:

Suggested Reading:

Related materials:

Week 6 - Feb 14th & 15th - Variational Inference

Materials:

Learning Outcomes:

Tutorial:

Related materials:

Week 7 - Feb 21st & 22nd - No classes - Reading week

Week 8 - Feb 28th & Feb 29th - Midterm

Week 9 - March 7th & 8th - Language Models and attention

Coverage

Materials

Readings

Week 10 - March 14th & 15th - Amortized inference and Variational Autoencoders

Coverage:

Related readings:

Week 11 - March 21st & 22nd - Kernel methods and Gaussian Processes

Week 12 - March 28th & 29th - Neural Networks

Content

Materials

Week 13 - April 4th & 5th - Final Exam Review, Contrastive Learning, Interpretability

Other resources: