# CSC 2547 Fall 2019: Learning to Search

## Overview

In planning, search, active learning, programming, and approximate inference, we usually face a series of similar tasks. We can often generalize from the problems solved so far, or even combine parts of previous solutions to solve a new one. This course will survey foundational ideas, recent work, and applications in this area. Specifically, it will cover self-improving tree-search methods such as alphazero, meta-learning, hypernetworks, self-tuning gradient estimators, amortized inference, self-improving theorem provers, and planning in POMDPs. Evaluation will be based mainly on a project involving original research by the students. Students should already be familiar with the basics of machine learning such as linear algebra, optimization, and probability.

The class will have a major project component, and will be run in a similar manner to Learning Discrete Latent Structure

## Prerequisites:

This course is designed to bring students to the current state of the art, so that ideally, their course projects can make a novel contribution. A previous course in machine learning such as CSC321, CSC411, CSC412, STA414, or ECE521 is strongly recommended. However, the only hard requirements are linear algebra, basic multivariate calculus, the basics of probability, and programming skills.

To check if you have the background for this course, try taking this Quiz. If more than half the questions are too difficult, you might want to put some extra work into preparation.

### Where and When

- Fall 2019
- Instructor: David Duvenaud
- Teaching Assistants: Shengyang Sun, Chris Cremer, Jonathan Lorraine
- Email: learn.search.2547@gmail.com is accessible to instructor and TAs.
- Location: Bahen room 1190
- Time: Fridays, 1-3pm
- Instructor Office hours: Mondays 3-4pm, in Pratt room 384
- Piazza: https://piazza.com/class/jxdq4fwzvb656y

## Why Learn To Search?

**Active Learning and Exploration**- Many problems require choosing which data would be most useful to acquire, or which experiment would be most useful to run. This can be viewed as a search problem: finding a sequential plan that can be expected to provide useful information by the end. Because we can adjust our plan after every action, we end up running many similar searches. Thus there is scope to gradually optimize our planning strategy.**Approximate Inference and Inverse Design**- In many situations, we know what a good explanation or design would look like, but need to search through a large discrete set to find one. For example, given a molecule we can often predict the mass spectra of its fragments, or how well it would perform a task. However finding a molecule that matches a given requirement is a hard search problem, that might benefit from experience of finding matches for similar tasks.**Program generation**- One of the hardest search tasks that humans regularly perform is programming, which can be viewed as a search for programs that meet a specification. One strategy for building large programs is to practice by building smaller programs to solve related problems. Programming is also a domain where one can often usefully re-use parts of other solutions.

## Course Structure

Aside from the first two and last two lectures, each week a different group of students will present on a set of related papers covering an aspect of these methods. I’ll provide guidance to each group about the content of these presentations.

In-class discussion will center around understanding the strengths and weaknesses of these methods, their relationships, possible extensions, and experiments that might better illuminate their properties.

The hope is that these discussions will lead to actual research papers, or resources that will help others understand these approaches.

Grades will be based on:

- [15%] One assignment, due Oct 3, to be handed in through MarkUs.
- [15%] Class presentations. Rubric
- [15%] Project proposal, due Oct 17th. Rubric
- [15%] Project presentations, November 22nd and 29th. Rubric
- [40%] Project report and code, due Dec 18th. Rubric

### Project

Students can work on projects individually, or in groups of up to four. The grade will depend on the ideas, how well you present them in the report, how clearly you position your work relative to existing literature, how illuminating your experiments are, and well-supported your conclusions are. Full marks will require a novel contribution.

Each group of students will write a short (around 2 pages) research project proposal, which ideally will be structured similarly to a standard paper.
It should include a description of a minimum viable project, some nice-to-haves if time allows, and a short review of related work.
You don’t have to do what your project proposal says - the point of the proposal is mainly to have *a* plan and to make it easy for me to give you feedback.

Towards the end of the course everyone will present their project in a short, roughly 5 minute, presentation.

At the end of the class you’ll hand in a project report (around 4 to 8 pages), ideally in the format of a machine learning conference paper such as NIPS.

## Calendar

## Tentative Schedule

### Week 1 - September 13th - Background, motivation, course setup

This lecture will set the scope of the course, the different settings where amortized search can be useful, and the main existing approaches. Slides

### Week 2 - September 20th - Tutorial on main existing approaches, history of field

- Slides
- Basics of search, tree search
- Discrete gradient estimators
- Relaxations

Recommended reading:

- Gradient Estimation Using Stochastic Computation Graphs
- Backpropagation through the Void: Optimizing control variates for black-box gradient estimation code

Related papers:

- The original REINFORCE paper.
- The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables - a simple trick: turn all the step functions into sigmoids, and use backprop to get a biased gradient estimate.
- Categorical Reparameterization with Gumbel-Softmax - the exact same idea as the Concrete distribution, published simultaneously.
- REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models - fixes the concrete estimator to make it unbiased, and also gives a way to tune the temperature automatically.
- Stochastic Backpropagation and Approximate Inference in Deep Generative Models - one of the modern explanations of the reparameterization trick.

### Week 3 - September 27th - Monte Carlo Tree Search and applications

Modern MCTS: Slides

- Thinking Fast and Slow with Deep Learning and Tree Search
- Mastering the game of Go without Human Knowledge
- Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

Chemistry Application: Slides

- Learning to Plan Chemical Syntheses
- Planning chemical syntheses with deep neural networks and symbolic AI

Robotics and planning applications: Slides

- QMDP-Net: Deep Learning for Planning under Partial Observability
- Integrating Algorithmic Planning and Deep Learning for Partially Observable Navigation
- End-to-end Interpretable Neural Motion Planner (2019)

Recent advances: Slides 1 Slides 2

Related work:

- Combining Planning and Deep Reinforcement Learning in Tactical Decision Making for Autonomous Driving
- An investigation of model-free planning
- M-Walk: Learning to Walk over Graphs using Monte Carlo Tree Search
- Deep Learning for Reward Design to Improve Monte Carlo Tree Search in ATARI Games
- MPC-Inspired Neural Network Policies for Sequential Decision Making

Other resources:

- Roger Grosse and Jimmy Ba’s MCTS Slides
- Alex Adam and Fartash Faghri’s thinking fast and slow slides
- Ferenc Huszar’s blog post on Expert Iteration

### Week 4 - October 4th - Learning to SAT Solve and Prove Theorems

Learning to SAT Solve:

- Machine Learning for Combinatorial Optimization: a Methodological Tour d’Horizon
- Learning a SAT Solver from Single-Bit Supervision
- Learning to Solve NP-Complete Problems: A Graph Neural Network for Decision TSP
- Guiding High-Performance SAT Solvers with Unsat-Core Predictions

Theorem-proving benchmarks and environments: Slides

- HOList: An Environment for Machine Learning of Higher-Order Theorem Proving
- Learning to Prove Theorems via Interacting with Proof Assistants (2019)
- GamePad: A Learning Environment for Theorem Proving (2018)

Approaches to learning to efficiently prove theorems Slides:

- Learning to Reason in Large Theories without Imitation (2019)
- Automated Theorem Proving in Intuitionistic Propositional Logic by Deep Reinforcement Learning
- Reinforcement Learning of Theorem Proving
- Deep Network Guided Proof Search
- Generating Correctness Proofs with Neural Networks
- Premise Selection for Theorem Proving by Deep Graph Embedding
- Graph Representations for Higher-Order Logic and Theorem Proving (2019)
- Curriculum Learning and Theorem Proving
- DeepMath: Deep Sequence Models for Premise Selection

Relaxation-based approaches:

### Week 5 - October 11th - Nested continuous optimization

Plain nested optimization:

- Generic Methods for Optimization-Based Modeling
- Gradient-based Hyperparameter Optimization through Reversible Learning (2015)
- How to train your MAML

Learning best-response functions:

- SMASH: One-Shot Model Architecture Search through HyperNetworks (2017)
- Stochastic Hyperparameter Optimization through Hypernetworks (2018)
- Self-Tuning Networks: Bilevel Optimization of Hyperparameters using Structured Best-Response Functions (2019)

Implicit function theorem:

- Reviving and Improving Recurrent Back-Propagation
- Meta-Learning with Implicit Gradients (2019)
- Deep Equilibrium Models (2019) can in principle be sped up by regularizing their dynamics to be easy to solve.

Game theory: Slides

- On Solving Minimax Optimization Locally: A Follow-the-ridge approach
- Convergence of Learning Dynamics in Stackelberg Games (2019)

Other resources:

### Week 6 - October 18th - Active Learning, POMDPs, and Bayesian Optimization:

Active learning: Slides

POMDPS: Slides

- Monte-Carlo Planning in Large POMDPs
- Deep Variational Reinforcement Learning for POMDPs (2018)
- Differentiable MPC for End-to-end Planning and Control (2018)

Curiosity and intrinsic motivation Slides:

- Planning to Be Surprised: Optimal Bayesian Exploration in Dynamic Environments
- VIME: Variational Information Maximizing Exploration
- Unifying Count-Based Exploration and Intrinsic Motivation (2016)

Other resources:

### Week 7 - October 25th - Evolutionary Approaches and Direct Optimization

- Evolution Strategies as a Scalable Alternative to Reinforcement Learning - replaces the exact gradient inside of REINFORCE with another call to REINFORCE.
- Evolvability ES: Scalable and Direct Optimization of Evolvability

Beam Search: Slides

- Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement
- Sequence-to-Sequence Learning as Beam-Search Optimization

Direct optimization: Slides

- Direct Loss Minimization for Structured Prediction
- Direct Optimization through argmax for Discrete Variational Auto-Encoder
- Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces, 2019

Related work:

### Week 8 - November 1st - Learning to program

Search-based approaches:

- Learning Compositional Neural Programs with Recursive Tree Search and Planning (2019)
- Neural-guided Deductive Search for Real-time Program Synthesis from Examples (2018)
- Neural Program Synthesis By Self-Learning

Gradient-based approaches:

- Neural Turing Machines
- Reinforcement Learning Neural Turing Machines - tries training an NTM with REINFORCE, but it doesn’t work very well.
- Programming with a Differentiable Forth Interpreter

Learning a library of functions:

- Bootstrap Learning via Modular Concept Discovery
- Library Learning for Neurally-Guided Bayesian Program Induction
- Program Synthesis and Semantic Parsing with Learned Code Idioms

More related work:

- Learning Programs: A Hierarchical Bayesian Approach
- Sampling for Bayesian Program Learning
- SPoC: Search-based Pseudocode to Code
- Neural Sketch Learning for Conditional Program Generation
- Robust Text-to-SQL Generation with Execution-Guided Decoding
- Neural Guided Constraint Logic Programming for Program Synthesis
- Learning Loop Invariants for Program Verification
- Program Synthesis for Character-Level Language Modeling

### Week 9 - November 8th - Meta-reasoning

- What can Neural Networks Reason about? (2019) Slides
- Supervising strong learners by amplifying weak experts Slides
- Reasoning about reasoning by nested conditioning: Modeling theory of mind with probabilistic programs Slides
- Principles of Metalevel Control PhD thesis of Nick Hay. Slides

Other related work:

### Week 10 - November 15th - Asymptotically Optimal Algorithms

Optimizing search strategies:

- Learning Transferable Graph Exploration (2019) Slides
- Learning to Prune: Speeding up Repeated Computations (2019) Slides
- Learning to Branch Slides

Optimizing agents: Slides

AIXI: Slides

- What is AIXI? — An Introduction to General Reinforcement Learning by Jan Leike
- Slides on AIXI from Marcus Hutter
- A Monte Carlo AIXI Approximation (2009)

Other related papers:

### Week 11 - November 22nd - Project presentations part I

The last two weeks will be a series of 5-minute project presentations.

### Week 12 - November 29th - Project presentations part II

## Extra related reading

- Gradient Estimation Using Stochastic Computation Graphs
- Backpropagation through the Void: Optimizing control variates for black-box gradient estimation code
- The original REINFORCE paper.
- The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables - a simple trick: turn all the step functions into sigmoids, and use backprop to get a biased gradient estimate.
- Categorical Reparameterization with Gumbel-Softmax - the exact same idea as the Concrete distribution, published simultaneously.
- REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models - fixes the concrete estimator to make it unbiased, and also gives a way to tune the temperature automatically.
- A Visual Guide to Evolution Strategies
- Evolution Strategies as a Scalable Alternative to Reinforcement Learning - replaces the exact gradient inside of REINFORCE with another call to REINFORCE.
- Optimization by Variational Bounding
- Natural Evolution Strategies
- On the Relationship Between the OpenAI Evolution Strategy and Stochastic Gradient Descent - shows that ES might work in high dimensions because most of the dimensions don’t usually matter.
- The Generalized Reparameterization Gradient - shows how to partially reparameterize some otherwise un-reparameterizable distributions.
- Developing Bug-Free Machine Learning Systems With Formal Mathematics - shows how to use formal tools to verify that a gradient estimator is unbiased.
- Learning Hard Alignments with Variational Inference - in machine translation, the alignment between input and output words can be treated as a discrete latent variable.
- Dynamic Planning Networks
- A Model to Search for Synthesizable Molecules
- Model-Based Planning in Discrete Action Spaces - “it is in fact possible to effectively perform planning via backprop in discrete action spaces”
- Generating and designing DNA with deep generative models
- Emergence of Grounded Compositional Language in Multi-Agent Populations
- The Case for Learned Index Structures
- Path Integral Networks: End-to-End Differentiable Optimal Control (2017)
- Learning to Search Better than Your Teacher
- GLASSES: Relieving The Myopia Of Bayesian Optimisation