使用可观察到的马尔可夫决策过程限制了主动分类

论文标题

使用可观察到的马尔可夫决策过程限制了主动分类

Constrained Active Classification Using Partially Observable Markov Decision Processes

论文作者

Wu, Bo, Lauffer, Niklas, Ahmadi, Mohamadreza, Bharadwaj, Suda, Xu, Zhe, Topcu, Ufuk

论文摘要

在这项工作中，我们研究了积极分类为有限的马尔可夫决策过程（MDP）模型的动态系统属性的问题。我们有兴趣寻找与动态系统积极相互作用并观察其反应的策略，以便有效地对感兴趣的属性有效地分类。我们提出了一个基于可观察到的马尔可夫决策过程（POMDP）的决策理论框架。所提出的框架依赖于将分类信念（概率分布）分配给感兴趣的属性。鉴于最初的信念，可以做出分类决策的信心水平，成本约束，安全的信念集和有限的时间范围，我们计算了POMDP策略，导致了分类决策。我们提出了三种不同的算法来计算此类策略。第一种算法通过价值迭代准确地计算出最佳策略。为了克服计算精确解决方案的计算复杂性，我们根据自适应采样提出了第二种算法，并基于蒙特卡洛树搜索的第三个算法，以近似达到分类决策的最佳概率。我们使用医学诊断，安全监视和野生动植物分类的示例说明了提出的方法。

In this work, we study the problem of actively classifying the attributes of dynamical systems characterized as a finite set of Markov decision process (MDP) models. We are interested in finding strategies that actively interact with the dynamical system and observe its reactions so that the attribute of interest is classified efficiently with high confidence. We present a decision-theoretic framework based on partially observable Markov decision processes (POMDPs). The proposed framework relies on assigning a classification belief (a probability distribution) to the attributes of interest. Given an initial belief, a confidence level over which a classification decision can be made, a cost bound, safe belief sets, and a finite time horizon, we compute POMDP strategies leading to classification decisions. We present three different algorithms to compute such strategies. The first algorithm computes the optimal strategy exactly by value iteration. To overcome the computational complexity of computing the exact solutions, we propose a second algorithm based on adaptive sampling and a third based on a Monte Carlo tree search to approximate the optimal probability of reaching a classification decision. We illustrate the proposed methodology using examples from medical diagnosis, security surveillance, and wildlife classification.

下载PDF全文

下载文献需遵守相关版权规定

论文标题