论文标题
链循环图最短转换的几乎精确的线性复杂算法
An Almost Exact Linear Complexity Algorithm of the Shortest Transformation of Chain-Cycle Graphs
论文作者
论文摘要
“基因组结构”是标记为1或2的顶点的标记的有向图。固定了此类图上的一组操作,并且每个操作都有一定的成本,严格的正数。转换问题在于以下内容:对于给定的结构A和B和给定的成本,请找到将A转换为B的最低总成本序列(“ A到B的最短转换为B”)。每个操作都对应于“事件”,后者是由执行其中一个操作引起的图表的更改。分配不同成本的可能性在应用中很重要,因为它允许区分频繁和罕见事件。显然,任意成本使问题NP-HARD会导致从一个限制成本到另一个成本的不平地,如果问题是通过线性或至少通过多项式算法解决的(假设P不相等的NP)。我们提出了一种新型的线性时间和空间算法,该算法构建了一系列操作,将A转化为B,总成本接近或等于绝对最小值。也就是说,如果所有所谓的DCJ操作都具有相同的成本W,并且如果删除和插入的成本既要大又小于W,则算法将A输出A到B中的转换为B,总成本与绝对最小值的总成本不同,最多在2W(在前一种情况下)或等于它(在后一种情况下)。在某些情况下,该算法会输出精确的溶液,例如,在循环基因组结构的情况下。可以省略关于删除和插入成本的条件(尽管下文所述的证明不包括此一般情况)。
A "genome structure" is a labeled directed graph with vertices of degree 1 or 2. A set of operations over such graphs is fixed, and each of the operations has a certain cost, a strictly positive number. The transformation problem consists in the following: for given structures a and b and given costs, find a minimum total cost sequence of operations transforming a into b ("the shortest transformation of a into b"). Each operation corresponds to an "event", the latter being a change in the graph caused by executing one of the operations over it. The possibility of assigning different costs is important in applications, since it allows to distinguish between frequent and rare events. Apparently, arbitrary costs make the problem NP-hard, which results in nontriviality of passing from one restriction on costs to another, if the problem is solved by a linear or at least polynomial algorithm (assuming that P is not equal NP). We propose a novel linear time and space algorithm which constructs a sequence of operations transforming a to b with total cost close or equal to the absolute minimum. Namely, if all the so-called DCJ operations have the same cost w and if deletions and insertions have costs either both larger or both less than w, then the algorithm outputs a transformation of a into b with the total cost differing from the absolute minimum by at most 2w (in the former case) or equal to it (in the latter case). In some cases, the algorithm outputs an exact solution, e.g., in the case of circular genome structures. The condition on the costs of deletions and insertions can be omitted (although the proof described below does not include this general case).