When: Tuesday/Thursday 2:00pm--3:20pm
Where: 137 Loomis Laboratory
Who: Dr. Yingying Li
Office Hours: Wednesdays, 4:30pm-5:30pm (Tentative) at CSL 347 (1308 W Main St, Urbana)
Part 1: When and How to Model Problems as Markov Decision Processes (MDP) and Optimal Control (OC)
In this part, we will introduce the basic formulation of MDP and OC and their variations. We will discuss how to model problems in these forms.
Part 2: How to Solve MDP and OC When Models are Known and Small.
In this part, we will learn an important and commonly used tool to handle both MDP and OC: Dynamic Programming (DP).
We will start with the simplest setting: finite-horizon formulation, then cover the infinite-horizon formulation, and briefly discuss the partial observation case.
We will also introduce popular algorithms inspired by DP, such as value iteration, policy iteration, rollout algorithms, and model predictive control
Part 3: When Models are Unknown: Reinforcement Learning and Learning-based Control
We will discuss major directions of algorithm designs for RL and LBC: including Q learning, policy gradient, actor-critic, certainty equivalence, etc. We will also introduce an important tool for RL in both algorithmic and theoretical senses: multi-armed bandit.
Part 4: When Models are Too Large: Approximations and Neural Networks
This part will introduce state/action/function approximations for RL and LBC to tackle the curse-of-dimensionality challenge.
Part 5: State-of-the-art and Research Topics: Guest Lectures
This part will introduce some advanced research topics on RL and LBC. Some sessions will have invited speakers discussing state-of-the-art research progress. Some tentative topics include: safe RL, robust RL, online control, multi-agent RL, RL theory, applications to robotics, etc.
Tentative Guest Speakers: Prof. Laixi Shi, Johns Hopkins, Prof. Nan Jiang, UIUC, Prof. Kaiqing Zhang, U of Maryland, Prof. Guannan Qu, CMU, Prof. Vasileios Tzoumas, U of Michigan.
Dimitri Bertsekas and John N. Tsitsiklis. Neuro-Dynamic Programming.
Andrew Barto and Richard S. Sutton. Reinforcement Learning: An Introduction
Assignments: 30%. (There will be three assignments, each contributes 10%.)
Midterm Exam: 30%.
Final Project: 40%.
Will be posted on Canvas
Please submit your answers to Gradescope.
One late submission is allowed.
Open book, but no computers or cell-phones or any electronic devices.
You can find all the details in the syllabus.pdf on Canvas.
Basically, 10% for project proposal and proposal discussion, 15% for in-class presentation, and 15% for a final project report.
Either literature review of existing papers or research projects such as applying RL or LBC to your research.
In-class presentations will take place before the Reading Day.