Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | |||
sdmia_invited_speakers [2015/11/10 07:27] matthijs |
sdmia_invited_speakers [2015/11/13 12:07] (current) matthijs |
||
---|---|---|---|
Line 39: | Line 39: | ||
[[http://web.engr.oregonstate.edu/~afern/|Oregon State]] | [[http://web.engr.oregonstate.edu/~afern/|Oregon State]] | ||
- | Title: **Learning to Speedup Planning: Filling the Gap Between Reaction and Thinking**\\ | + | Title: **Kinder and Gentler Teaching Modes for Human-Assisted Policy Learning**\\ |
Abstract:\\ | Abstract:\\ | ||
- | The product of most learning algorithms for sequential decision | + | This talk considers the problem of teaching action policies to computers for sequential decision making. The vast majority of policy learning algorithms offer human teachers little flexibility in how policies are taught. In particular, one of two learning modes is typically considered: 1) Imitation learning, where the teacher demonstrates explicit action sequences to the learner, and 2) Reinforcement learning, where the teacher designs a reward function for the learner to autonomously optimize via practice. This is in sharp contrast to how humans teach other humans, where many other learning modes are commonly used besides imitation and practice. The talk will highlight some of our recent work on broadening the available learning modes for computer policy learners, with the eventual aim of allowing humans to teach computers more naturally and efficiently. In addition, we will sketch some of the challenges in this research direction for both policy learning and more general planning systems. |
- | making is a policy, which supports very fast, or reactive, decision | + | |
- | making. Another way to compute decision, given a domain model or | + | |
- | simulator, is to use a deliberative planning algorithm, potentially | + | |
- | at a high computational cost. One perspective is that algorithms | + | |
- | for learning reactive policies are attempting to compiling away the | + | |
- | deliberative "thinking" process of planners into fast circuits. | + | |
- | Intuition suggest, however, that such compilation will not support | + | |
- | quality decision making in the most difficult domains (e.g. chess, | + | |
- | logistics, etc.). In other words, some domains will always require | + | |
- | some amount of deliberative planning. Is there a role for learning | + | |
- | in such cases? | + | |
- | In this talk, I will revisit the old idea of speedup learning for | ||
- | planning, where the goal of learning is to speedup a deliberative | ||
- | planning in a domain, given experience in that domain. This speedup | ||
- | learning framework offers a bridge between learning for purely | ||
- | reactive behavior and pure deliberative planning. I will review | ||
- | some prior work and speculate about why it produced only limited | ||
- | successes. I will then review some of our own recent work in the | ||
- | area of speedup learning for MDP tree search and discuss potential | ||
- | future directions. | ||