Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
sdmia_invited_speakers [2015/11/10 07:27]
matthijs
sdmia_invited_speakers [2015/11/13 12:07]
matthijs
Line 39: Line 39:
 [[http://​web.engr.oregonstate.edu/​~afern/​|Oregon State]] [[http://​web.engr.oregonstate.edu/​~afern/​|Oregon State]]
  
-Title: **Learning to Speedup Planning: Filling the Gap Between Reaction ​and Thinking**\\+Title: **Kinder ​and Gentler Teaching Modes for Human-Assisted Policy Learning**\\
 Abstract:\\ Abstract:\\
-The product ​of most learning algorithms ​for sequential decision +This talk considers the problem ​of teaching action policies to computers ​for sequential decision making. The vast majority of policy ​learning algorithms offer human teachers little flexibility in how policies are taught. In particularone of two learning modes is typically considered: 1) Imitation learningwhere the teacher demonstrates explicit action sequences ​to the learnerand 2) Reinforcement learningwhere the teacher designs ​reward function for the learner to autonomously optimize via practiceThis is in sharp contrast to how humans teach other humans, where many other learning ​modes are commonly used besides imitation and practiceThe talk will highlight some of our recent work on broadening ​the available learning modes for computer policy learnerswith the eventual aim of allowing humans to teach computers more naturally and efficiently. In additionwe will sketch ​some of the challenges in this research direction ​for both policy ​learning ​and more general planning systems.
-making ​is a policy, ​which supports very fastor reactive, decision +
-making. Another way to compute decisiongiven a domain model or +
-simulatoris to use deliberative planning algorithm, potentially +
-at a high computational costOne perspective ​is that algorithms +
-for learning ​reactive policies ​are attempting to compiling away the +
-deliberative "​thinking"​ process of planners into fast circuits. +
-Intuition suggest, however, that such compilation ​will not support +
-quality decision making in the most difficult domains (e.g. chess, +
-logisticsetc.). In other wordssome domains ​will always require +
-some amount ​of deliberative planning. Is there a role for learning +
-in such cases?+
  
-In this talk, I will revisit the old idea of speedup learning for 
-planning, where the goal of learning is to speedup a deliberative 
-planning in a domain, given experience in that domain. This speedup 
-learning framework offers a bridge between learning for purely 
-reactive behavior and pure deliberative planning. I will review 
-some prior work and speculate about why it produced only limited 
-successes. I will then review some of our own recent work in the 
-area of speedup learning for MDP tree search and discuss potential 
-future directions. 
  
  
Recent changes RSS feed Creative Commons License Donate Minima Template by Wikidesign Driven by DokuWiki