论文标题
线性正规化器强制执行严格的马鞍属性
Linear Regularizers Enforce the Strict Saddle Property
论文作者
论文摘要
对严格的鞍属性的满意度已成为非凸优化的标准假设,并确保许多一阶优化算法几乎总是会逃脱鞍点。但是,在机器学习中存在不满足该属性的功能,例如至少两个隐藏层的神经网络的损失函数。诸如梯度下降之类的一阶方法可能会收敛到此类功能的非分数鞍点,并且目前尚不存在任何可靠地逃脱非图案鞍点的一阶方法。为了满足这一需求,我们证明,用线性项正规化功能会执行严格的鞍属性,并且我们仅在本地正规化时,即当梯度的规范低于特定阈值时。我们分析了可能由这种形式的正规化产生的分叉,然后为正规化器提供了一个仅取决于目标函数梯度的选择规则。该规则证明是为了确保梯度下降将逃脱一系列非碎片鞍点周围的社区,并且这种行为在优化文献中常见的非图案鞍点的数值示例中得到了证明。
Satisfaction of the strict saddle property has become a standard assumption in non-convex optimization, and it ensures that many first-order optimization algorithms will almost always escape saddle points. However, functions exist in machine learning that do not satisfy this property, such as the loss function of a neural network with at least two hidden layers. First-order methods such as gradient descent may converge to non-strict saddle points of such functions, and there do not currently exist any first-order methods that reliably escape non-strict saddle points. To address this need, we demonstrate that regularizing a function with a linear term enforces the strict saddle property, and we provide justification for only regularizing locally, i.e., when the norm of the gradient falls below a certain threshold. We analyze bifurcations that may result from this form of regularization, and then we provide a selection rule for regularizers that depends only on the gradient of an objective function. This rule is shown to guarantee that gradient descent will escape the neighborhoods around a broad class of non-strict saddle points, and this behavior is demonstrated on numerical examples of non-strict saddle points common in the optimization literature.