论文标题
由培训数据和先验知识驱动的符号回归
Symbolic Regression Driven by Training Data and Prior Knowledge
论文作者
论文摘要
在符号回归中,对分析模型的搜索通常纯粹由训练数据样本上观察到的预测错误驱动。但是,当数据样本无法充分覆盖输入空间时,预测误差并不能为所需模型提供足够的指导。然后,标准符号回归技术产生的模型,例如,就其稳态特征或局部行为而言,部分不正确。如果在搜索过程中已经考虑了这些属性,则可以产生更准确和相关的模型。我们提出了一种由训练数据和对属性的先验知识驱动的多目标符号回归方法,所需模型应表现出来。以形式约束形式给出的属性由一组离散的数据示例内部表示,并在其上完全检查了候选模型。对三个测试问题进行了实验评估,该方法清楚地表明了其能力,即演变能够符合训练数据的现实模型,同时同时符合所需模型特征的先验知识。就平均平方偏离参考模型而言,它的表现优于标准符号回归。
In symbolic regression, the search for analytic models is typically driven purely by the prediction error observed on the training data samples. However, when the data samples do not sufficiently cover the input space, the prediction error does not provide sufficient guidance toward desired models. Standard symbolic regression techniques then yield models that are partially incorrect, for instance, in terms of their steady-state characteristics or local behavior. If these properties were considered already during the search process, more accurate and relevant models could be produced. We propose a multi-objective symbolic regression approach that is driven by both the training data and the prior knowledge of the properties the desired model should manifest. The properties given in the form of formal constraints are internally represented by a set of discrete data samples on which candidate models are exactly checked. The proposed approach was experimentally evaluated on three test problems with results clearly demonstrating its capability to evolve realistic models that fit the training data well while complying with the prior knowledge of the desired model characteristics at the same time. It outperforms standard symbolic regression by several orders of magnitude in terms of the mean squared deviation from a reference model.