Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Last revision Both sides next revision | ||
sdmia_invited_speakers [2015/09/28 15:43] matthijs |
sdmia_invited_speakers [2015/11/10 07:27] matthijs |
||
---|---|---|---|
Line 39: | Line 39: | ||
[[http://web.engr.oregonstate.edu/~afern/|Oregon State]] | [[http://web.engr.oregonstate.edu/~afern/|Oregon State]] | ||
- | Title: TBD\\ | + | Title: **Learning to Speedup Planning: Filling the Gap Between Reaction and Thinking**\\ |
Abstract:\\ | Abstract:\\ | ||
- | TBD | + | The product of most learning algorithms for sequential decision |
+ | making is a policy, which supports very fast, or reactive, decision | ||
+ | making. Another way to compute decision, given a domain model or | ||
+ | simulator, is to use a deliberative planning algorithm, potentially | ||
+ | at a high computational cost. One perspective is that algorithms | ||
+ | for learning reactive policies are attempting to compiling away the | ||
+ | deliberative "thinking" process of planners into fast circuits. | ||
+ | Intuition suggest, however, that such compilation will not support | ||
+ | quality decision making in the most difficult domains (e.g. chess, | ||
+ | logistics, etc.). In other words, some domains will always require | ||
+ | some amount of deliberative planning. Is there a role for learning | ||
+ | in such cases? | ||
+ | |||
+ | In this talk, I will revisit the old idea of speedup learning for | ||
+ | planning, where the goal of learning is to speedup a deliberative | ||
+ | planning in a domain, given experience in that domain. This speedup | ||
+ | learning framework offers a bridge between learning for purely | ||
+ | reactive behavior and pure deliberative planning. I will review | ||
+ | some prior work and speculate about why it produced only limited | ||
+ | successes. I will then review some of our own recent work in the | ||
+ | area of speedup learning for MDP tree search and discuss potential | ||
+ | future directions. | ||
=== Mykel Kochenderfer === | === Mykel Kochenderfer === | ||
Line 53: | Line 75: | ||
[[http://teamcore.usc.edu/tambe/|USC]] | [[http://teamcore.usc.edu/tambe/|USC]] | ||
- | Title: TBD\\ | + | Joint work with Eric Rice, Amulya Yadav, and Robin Petering. |
+ | |||
+ | Title: **PSINET: Assisting HIV Prevention Amongst Homeless Youth using POMDPs**\\ | ||
Abstract:\\ | Abstract:\\ | ||
- | TBD | + | Homeless youth are prone to Human Immunodeficiency |
+ | Virus (HIV) due to their engagement in high risk behavior | ||
+ | such as unprotected sex, sex under influence of | ||
+ | drugs, etc. Many non-profit agencies conduct interventions | ||
+ | to educate and train a select group of homeless | ||
+ | youth about HIV prevention and treatment practices and | ||
+ | rely on word-of-mouth spread of information through | ||
+ | their social network. Previous work in strategic selection | ||
+ | of intervention participants does not handle uncertainties | ||
+ | in the social network’s structure and evolving | ||
+ | network state, potentially causing significant shortcomings | ||
+ | in spread of information. Thus, we developed | ||
+ | PSINET, a decision support system to aid the agencies | ||
+ | in this task. PSINET includes the following key novelties: | ||
+ | (i) it handles uncertainties in network structure | ||
+ | and evolving network state; (ii) it addresses these uncertainties | ||
+ | by using POMDPs in influence maximization; | ||
+ | and (iii) it provides algorithmic advances to allow high | ||
+ | quality approximate solutions for such POMDPs. We are about | ||
+ | to conduct a pilot test study with homeless youth in Los Angeles; | ||
+ | we will present a progress report. | ||
=== Jason Williams === | === Jason Williams === |