论文标题
符号回归是NP-HARD
Symbolic Regression is NP-hard
论文作者
论文摘要
符号回归(SR)是学习数学表达式形式的数据模型的任务。从本质上讲,SR模型有可能同时准确和人性化。不幸的是,找到这样的模型,即执行SR,似乎是一项计算密集的任务。从历史上看,SR一直通过诸如贪婪或遗传算法等启发式方法解决,尽管有些作品暗示了SR的可能硬度,但尚无证据证明SR实际上是NP-HARD。这就提出了一个问题:是否有确切的多项式时间算法来计算SR模型?我们提供的证据表明,通过证明SR是NP-HARD,答案可能是负面的。
Symbolic regression (SR) is the task of learning a model of data in the form of a mathematical expression. By their nature, SR models have the potential to be accurate and human-interpretable at the same time. Unfortunately, finding such models, i.e., performing SR, appears to be a computationally intensive task. Historically, SR has been tackled with heuristics such as greedy or genetic algorithms and, while some works have hinted at the possible hardness of SR, no proof has yet been given that SR is, in fact, NP-hard. This begs the question: Is there an exact polynomial-time algorithm to compute SR models? We provide evidence suggesting that the answer is probably negative by showing that SR is NP-hard.