IE598: Reinforcement Learning & Learning-based Control

Course Information

When: Tuesday/Thursday 2:00pm--3:20pm
Where: 137 Loomis Laboratory
Who: Dr. Yingying Li
Office Hours: Wednesdays, 4:30pm-5:30pm (Tentative) at CSL 347 (1308 W Main St, Urbana)

Tentative Schedule

Part 1: When and How to Model Problems as Markov Decision Processes (MDP) and Optimal Control (OC)

In this part, we will introduce the basic formulation of MDP and OC and their variations. We will discuss how to model problems in these forms.

Part 2: How to Solve MDP and OC When Models are Known and Small.

In this part, we will learn an important and commonly used tool to handle both MDP and OC: Dynamic Programming (DP).

We will start with the simplest setting: finite-horizon formulation, then cover the infinite-horizon formulation, and briefly discuss the partial observation case.

We will also introduce popular algorithms inspired by DP, such as value iteration, policy iteration, rollout algorithms, and model predictive control

Part 3: When Models are Unknown: Reinforcement Learning and Learning-based Control

We will discuss major directions of algorithm designs for RL and LBC: including Q learning, policy gradient, actor-critic, certainty equivalence, etc. We will also introduce an important tool for RL in both algorithmic and theoretical senses: multi-armed bandit.

Part 4: When Models are Too Large: Approximations and Neural Networks

This part will introduce state/action/function approximations for RL and LBC to tackle the curse-of-dimensionality challenge.

Part 5: State-of-the-art and Research Topics: Guest Lectures

This part will introduce some advanced research topics on RL and LBC. Some sessions will have invited speakers discussing state-of-the-art research progress. Some tentative topics include: safe RL, robust RL, online control, multi-agent RL, RL theory, applications to robotics, etc.

Tentative Guest Speakers: Prof. Laixi Shi, Johns Hopkins, Prof. Nan Jiang, UIUC, Prof. Kaiqing Zhang, U of Maryland, Prof. Guannan Qu, CMU, Prof. Vasileios Tzoumas, U of Michigan.

Textbooks

Dimitri Bertsekas. Dynamic Programming and Optimal Control: Volume I and II

Other Useful Resources

Dimitri Bertsekas and John N. Tsitsiklis. Neuro-Dynamic Programming.
Andrew Barto and Richard S. Sutton. Reinforcement Learning: An Introduction
MIT 6.7950 Reinforcement Learning: Foundations and Methods
Stanford CS234: Reinforcement Learning

Grading

Assignments: 30%. (There will be three assignments, each contributes 10%.)
Midterm Exam: 30%.
Final Project: 40%.

Assignments

Will be posted on Canvas
Please submit your answers to Gradescope.
One late submission is allowed.

Midterm

Open book, but no computers or cell-phones or any electronic devices.

Final Project

You can find all the details in the syllabus.pdf on Canvas.
Basically, 10% for project proposal and proposal discussion, 15% for in-class presentation, and 15% for a final project report.
Either literature review of existing papers or research projects such as applying RL or LBC to your research.
In-class presentations will take place before the Reading Day.

Page updated

Google Sites

Report abuse