论文标题

布尔值和$ \ mathbb {f} _p $ -matrix分解:从理论到练习

Boolean and $\mathbb{F}_p$-Matrix Factorization: From Theory to Practice

论文作者

Fomin, Fedor, Panolan, Fahad, Patil, Anurag, Tanveer, Adil

论文摘要

布尔矩阵分解(BMF)旨在找到给定二进制基质作为两个低级二进制矩阵的布尔产物的近似值。二进制数据在许多领域都无处不在,并且通过二进制矩阵代表数据在医学,自然语言处理,生物信息学,计算机图形等方面很常见。不幸的是,BMF在计算方面是硬性的,并且使用启发式算法来计算布尔因子化。最近,两个研究小组独立地获得了理论突破。 Ban等。 (SODA 2019)和Fomin等。 (Trans。算法2020)表明,BMF接受有效的多项式近似方案(EPTAS)。然而,尽管理论上的重要性,但从等级的运行时间的高指数依赖性使这些算法在实践中无法实现。激发我们工作的主要研究问题是BMF的理论进步是否可能导致实际算法。 我们工作的主要概念性贡献是以下内容。尽管BMF的EPTA是一个纯粹的理论进步,但这些算法背后的一般方法可以作为设计更好的启发式方法的基础。我们还使用此策略来为相关的$ \ mathbb {f} _p $ -matrix分解开发新算法。在这里,鉴于有限场GF($ p $)的矩阵$ a $,其中$ p $是素数,而整数$ r $,我们的目标是在同一领域上找到一个矩阵$ b $($ p $)($ p $) - 最多$ r $最小化$ a-b $的某些规范。我们对合成和现实世界数据的实证研究表明,新算法比以前的作品对BMF和$ \ Mathbb {f} _p $ -matrix分解了。

Boolean Matrix Factorization (BMF) aims to find an approximation of a given binary matrix as the Boolean product of two low-rank binary matrices. Binary data is ubiquitous in many fields, and representing data by binary matrices is common in medicine, natural language processing, bioinformatics, computer graphics, among many others. Unfortunately, BMF is computationally hard and heuristic algorithms are used to compute Boolean factorizations. Very recently, the theoretical breakthrough was obtained independently by two research groups. Ban et al. (SODA 2019) and Fomin et al. (Trans. Algorithms 2020) show that BMF admits an efficient polynomial-time approximation scheme (EPTAS). However, despite the theoretical importance, the high double-exponential dependence of the running times from the rank makes these algorithms unimplementable in practice. The primary research question motivating our work is whether the theoretical advances on BMF could lead to practical algorithms. The main conceptional contribution of our work is the following. While EPTAS for BMF is a purely theoretical advance, the general approach behind these algorithms could serve as the basis in designing better heuristics. We also use this strategy to develop new algorithms for related $\mathbb{F}_p$-Matrix Factorization. Here, given a matrix $A$ over a finite field GF($p$) where $p$ is a prime, and an integer $r$, our objective is to find a matrix $B$ over the same field with GF($p$)-rank at most $r$ minimizing some norm of $A-B$. Our empirical research on synthetic and real-world data demonstrates the advantage of the new algorithms over previous works on BMF and $\mathbb{F}_p$-Matrix Factorization.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源