SDMIA Fall Symposium: Invited Speakers
Craig Boutilier
Title: Large-scale MDPs in Practice: Opportunities and Challenges
Abstract:
Markov decision processes (MDPs) have been very well-studied in AI over
the past 20 years and offer great promise as a model for sophisticated
decision making. However, the practical applications of MDPs and
reinforcement learning (RL)—in particular, AI-based approaches—have
been somewhat limited. Indeed, the use of MDPs and RL in AI applications
pales in comparison to the wide-ranging applications of machine learning
across a variety of industrial sectors.
In this talk, I'll discuss:
- a sample of areas of direct industrial relevance where MDPs and RL have great promise;
- some speculation as to why ML methods in these areas have succeeded, while the application of sequential decision-making techniques has faltered;
- how we can bridge that gap, including: techniques for leveraging existing large-scale ML methods for modeling MDPs; the tension between model-based and model-free methods; and time permitting, some thoughts on solution methods for such models at industrial scale.
Emma Brunskill
Title: Quickly Learning to Make Good Decisions
Abstract:
A fundamental goal of artificial intelligence is to create agents that
learn to make good decisions as they interact with a stochastic
environment. Some of the most exciting and valuable potential
applications involve systems that interact directly with humans, such as
intelligent tutoring systems or medical interfaces. In these cases,
sample efficiency is highly important, as each decision, good or bad,
is impacting a real person. I will describe our research on tackling
this challenge, as well as its relevance to improving educational tools.
Alan Fern
Title: Kinder and Gentler Teaching Modes for Human-Assisted Policy Learning
Abstract:
This talk considers the problem of teaching action policies to computers for sequential decision making. The vast majority of policy learning algorithms offer human teachers little flexibility in how policies are taught. In particular, one of two learning modes is typically considered: 1) Imitation learning, where the teacher demonstrates explicit action sequences to the learner, and 2) Reinforcement learning, where the teacher designs a reward function for the learner to autonomously optimize via practice. This is in sharp contrast to how humans teach other humans, where many other learning modes are commonly used besides imitation and practice. The talk will highlight some of our recent work on broadening the available learning modes for computer policy learners, with the eventual aim of allowing humans to teach computers more naturally and efficiently. In addition, we will sketch some of the challenges in this research direction for both policy learning and more general planning systems.
Mykel Kochenderfer
Title: Decision Theoretic Planning for Air Traffic Applications
Abstract:
Every large aircraft in the world is equipped with a collision avoidance system that alerts pilots to potential conflict with other aircraft and recommends maneuvers to avoid collision. Due to the potentially catastrophic consequences of error in their operation, the complex decision making rules underlying the system have received considerable scrutiny over the past few decades. Recently, the international safety community has been working to standardize a new system for worldwide deployment that is derived from a partially observable Markov decision process (POMDP) formulation. This talk will discuss the process taken for developing the system and building confidence in its safe operation. In addition, several other applications of POMDPs to air traffic problems will be outlined.
Milind Tambe
Joint work with Eric Rice, Amulya Yadav, and Robin Petering.
Title: PSINET: Assisting HIV Prevention Amongst Homeless Youth using POMDPs
Abstract:
Homeless youth are prone to Human Immunodeficiency
Virus (HIV) due to their engagement in high risk behavior
such as unprotected sex, sex under influence of
drugs, etc. Many non-profit agencies conduct interventions
to educate and train a select group of homeless
youth about HIV prevention and treatment practices and
rely on word-of-mouth spread of information through
their social network. Previous work in strategic selection
of intervention participants does not handle uncertainties
in the social network’s structure and evolving
network state, potentially causing significant shortcomings
in spread of information. Thus, we developed
PSINET, a decision support system to aid the agencies
in this task. PSINET includes the following key novelties:
(i) it handles uncertainties in network structure
and evolving network state; (ii) it addresses these uncertainties
by using POMDPs in influence maximization;
and (iii) it provides algorithmic advances to allow high
quality approximate solutions for such POMDPs. We are about
to conduct a pilot test study with homeless youth in Los Angeles;
we will present a progress report.
Jason Williams
Title: Decision-theoretic control in dialog systems: recent progress and opportunities for research
Abstract:
Dialog systems interact with a person using natural language to help them achieve some goal. Dialog systems are now a part of daily life, with commercial systems including Microsoft Cortana, Apple Siri, Amazon Echo, Google Now, Facebook M, in-car systems, and many others. Because dialog is a sequential process, and because computers' ability to understand human language is error-prone, it has long been an important application for sequential decision making under uncertainty. In this talk, I will first present the dialog system problem through the lens of decision making under uncertainty. I'll then survey recent work which has tailored methods for state tracking and action selection from the general machine learning literature to the dialog problem. Finally, I'll discuss open problems and current opportunities for research.
Shlomo Zilberstein
Title: Do We Expect Too Much from DEC-POMDP Algorithms?
Abstract:
Sequential decision models such as DEC-POMDPs are powerful and elegant approaches for planning in situations that involve multiple cooperating decision makers. They are powerful in the sense that we can, in principle, capture a rich class of problems. They are elegant in the sense that they include the minimal set of ingredients needed to analyze these problems and facilitate rigorous mathematical examination of their fundamental properties. An optimal solution of a DEC-POMDP explicitly answers the question of what should an agent do to maximize value. Implicitly, an optimal solution answers many other questions including the appropriate assignment of meaning to internal memory states, appropriate adoption of goals and subgoals, appropriate assignment of roles to agents, and appropriate assignment of meaning to messages that agents exchange. In fact, an optimal policy optimizes all these choices implicitly. In this talk, I argue that this is just too much to expect from a computational point of view. There is much to be gained by decomposing the planning problem in a way that some of these questions are answered first and a simplified planning problem is then solved. I discuss a few examples of such decompositions and examine their contribution to the scalability of planning algorithms