CSC 2547 Fall 2019: Learning to Search

Overview

In planning, search, active learning, programming, and approximate inference, we usually face a series of similar tasks. We can often generalize from the problems solved so far, or even combine parts of previous solutions to solve a new one. This course will survey foundational ideas, recent work, and applications in this area. Specifically, it will cover self-improving tree-search methods such as alphazero, meta-learning, hypernetworks, self-tuning gradient estimators, amortized inference, self-improving theorem provers, and planning in POMDPs. Evaluation will be based mainly on a project involving original research by the students. Students should already be familiar with the basics of machine learning such as linear algebra, optimization, and probability.

The class will have a major project component, and will be run in a similar manner to Learning Discrete Latent Structure

Prerequisites:

This course is designed to bring students to the current state of the art, so that ideally, their course projects can make a novel contribution. A previous course in machine learning such as CSC321, CSC411, CSC412, STA414, or ECE521 is strongly recommended. However, the only hard requirements are linear algebra, basic multivariate calculus, the basics of probability, and programming skills.

To check if you have the background for this course, try taking this Quiz. If more than half the questions are too difficult, you might want to put some extra work into preparation.

Where and When

Fall 2019
Instructor: David Duvenaud
Teaching Assistants: Shengyang Sun, Chris Cremer, Jonathan Lorraine
Email: learn.search.2547@gmail.com is accessible to instructor and TAs.
Location: Bahen room 1190
Time: Fridays, 1-3pm
Instructor Office hours: Mondays 3-4pm, in Pratt room 384
Piazza: https://piazza.com/class/jxdq4fwzvb656y

Why Learn To Search?

Active Learning and Exploration - Many problems require choosing which data would be most useful to acquire, or which experiment would be most useful to run. This can be viewed as a search problem: finding a sequential plan that can be expected to provide useful information by the end. Because we can adjust our plan after every action, we end up running many similar searches. Thus there is scope to gradually optimize our planning strategy.
Approximate Inference and Inverse Design - In many situations, we know what a good explanation or design would look like, but need to search through a large discrete set to find one. For example, given a molecule we can often predict the mass spectra of its fragments, or how well it would perform a task. However finding a molecule that matches a given requirement is a hard search problem, that might benefit from experience of finding matches for similar tasks.
Program generation - One of the hardest search tasks that humans regularly perform is programming, which can be viewed as a search for programs that meet a specification. One strategy for building large programs is to practice by building smaller programs to solve related problems. Programming is also a domain where one can often usefully re-use parts of other solutions.

Course Structure

Aside from the first two and last two lectures, each week a different group of students will present on a set of related papers covering an aspect of these methods. I’ll provide guidance to each group about the content of these presentations.

In-class discussion will center around understanding the strengths and weaknesses of these methods, their relationships, possible extensions, and experiments that might better illuminate their properties.

The hope is that these discussions will lead to actual research papers, or resources that will help others understand these approaches.

Grades will be based on:

[15%] One assignment, due Oct 3, to be handed in through MarkUs.
- Tex imsart.cls imsart.sty macros.tex
[15%] Class presentations. Rubric
[15%] Project proposal, due Oct 17th. Rubric
[15%] Project presentations, November 22nd and 29th. Rubric
[40%] Project report and code, due Dec 18th. Rubric

Project

Students can work on projects individually, or in groups of up to four. The grade will depend on the ideas, how well you present them in the report, how clearly you position your work relative to existing literature, how illuminating your experiments are, and well-supported your conclusions are. Full marks will require a novel contribution.

Each group of students will write a short (around 2 pages) research project proposal, which ideally will be structured similarly to a standard paper. It should include a description of a minimum viable project, some nice-to-haves if time allows, and a short review of related work. You don’t have to do what your project proposal says - the point of the proposal is mainly to have a plan and to make it easy for me to give you feedback.

Towards the end of the course everyone will present their project in a short, roughly 5 minute, presentation.

At the end of the class you’ll hand in a project report (around 4 to 8 pages), ideally in the format of a machine learning conference paper such as NIPS.

Calendar

Tentative Schedule

Week 1 - September 13th - Background, motivation, course setup

This lecture will set the scope of the course, the different settings where amortized search can be useful, and the main existing approaches. Slides

Week 2 - September 20th - Tutorial on main existing approaches, history of field

Slides
Basics of search, tree search
Discrete gradient estimators
Relaxations

Week 3 - September 27th - Monte Carlo Tree Search and applications

Modern MCTS: Slides

Chemistry Application: Slides

Robotics and planning applications: Slides

Recent advances: Slides 1 Slides 2

Related work:

Other resources:

Week 4 - October 4th - Learning to SAT Solve and Prove Theorems

Learning to SAT Solve:

Theorem-proving benchmarks and environments: Slides

Approaches to learning to efficiently prove theorems Slides:

Relaxation-based approaches:

End-to-End Differentiable Proving (2017)

Week 5 - October 11th - Nested continuous optimization

Plain nested optimization:

Learning best-response functions:

Implicit function theorem:

Reviving and Improving Recurrent Back-Propagation
Meta-Learning with Implicit Gradients (2019)
Deep Equilibrium Models (2019) can in principle be sped up by regularizing their dynamics to be easy to solve.

Game theory: Slides

Other resources:

Week 6 - October 18th - Active Learning, POMDPs, and Bayesian Optimization:

Active learning: Slides

POMDPS: Slides

Curiosity and intrinsic motivation Slides:

Other resources:

Slides on Efficient Nonmyopic Active Search by Zain Hasan and Daniel Hidru

Week 7 - October 25th - Evolutionary Approaches and Direct Optimization

Evolution Strategies as a Scalable Alternative to Reinforcement Learning - replaces the exact gradient inside of REINFORCE with another call to REINFORCE.
Evolvability ES: Scalable and Direct Optimization of Evolvability

Beam Search: Slides

Direct optimization: Slides

Related work:

Amortized Bethe Free Energy Minimization for Learning MRFs

Week 8 - November 1st - Learning to program

Search-based approaches:

Gradient-based approaches:

Neural Turing Machines
Reinforcement Learning Neural Turing Machines - tries training an NTM with REINFORCE, but it doesn’t work very well.
Programming with a Differentiable Forth Interpreter

Learning a library of functions:

More related work:

Week 9 - November 8th - Meta-reasoning

Other related work:

Logical Induction

Week 10 - November 15th - Asymptotically Optimal Algorithms

Optimizing search strategies:

Optimizing agents: Slides

AIXI: Slides

Other related papers:

Learning to Search in Branch-and-Bound Algorithms

Week 11 - November 22nd - Project presentations part I

The last two weeks will be a series of 5-minute project presentations.

Week 12 - November 29th - Project presentations part II

Gradient Estimation Using Stochastic Computation Graphs
Backpropagation through the Void: Optimizing control variates for black-box gradient estimation code
The original REINFORCE paper.
The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables - a simple trick: turn all the step functions into sigmoids, and use backprop to get a biased gradient estimate.
Categorical Reparameterization with Gumbel-Softmax - the exact same idea as the Concrete distribution, published simultaneously.
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models - fixes the concrete estimator to make it unbiased, and also gives a way to tune the temperature automatically.
A Visual Guide to Evolution Strategies
Evolution Strategies as a Scalable Alternative to Reinforcement Learning - replaces the exact gradient inside of REINFORCE with another call to REINFORCE.
Optimization by Variational Bounding
Natural Evolution Strategies
On the Relationship Between the OpenAI Evolution Strategy and Stochastic Gradient Descent - shows that ES might work in high dimensions because most of the dimensions don’t usually matter.
The Generalized Reparameterization Gradient - shows how to partially reparameterize some otherwise un-reparameterizable distributions.
Developing Bug-Free Machine Learning Systems With Formal Mathematics - shows how to use formal tools to verify that a gradient estimator is unbiased.
Learning Hard Alignments with Variational Inference - in machine translation, the alignment between input and output words can be treated as a discrete latent variable.
Dynamic Planning Networks
A Model to Search for Synthesizable Molecules
Model-Based Planning in Discrete Action Spaces - “it is in fact possible to effectively perform planning via backprop in discrete action spaces”
Generating and designing DNA with deep generative models
Emergence of Grounded Compositional Language in Multi-Agent Populations
The Case for Learned Index Structures
Path Integral Networks: End-to-End Differentiable Optimal Control (2017)
Learning to Search Better than Your Teacher
GLASSES: Relieving The Myopia Of Bayesian Optimisation

CSC 2547 Fall 2019: Learning to Search

Overview

Prerequisites:

Where and When

Why Learn To Search?

Course Structure

Project

Calendar

Tentative Schedule

Week 1 - September 13th - Background, motivation, course setup

Week 2 - September 20th - Tutorial on main existing approaches, history of field

Week 3 - September 27th - Monte Carlo Tree Search and applications

Week 4 - October 4th - Learning to SAT Solve and Prove Theorems

Week 5 - October 11th - Nested continuous optimization

Week 6 - October 18th - Active Learning, POMDPs, and Bayesian Optimization:

Week 7 - October 25th - Evolutionary Approaches and Direct Optimization

Week 8 - November 1st - Learning to program

Week 9 - November 8th - Meta-reasoning

Week 10 - November 15th - Asymptotically Optimal Algorithms

Week 11 - November 22nd - Project presentations part I

Week 12 - November 29th - Project presentations part II

Extra related reading