论文标题
辩护随机梯度下降以处理缺失值
Debiasing Stochastic Gradient Descent to handle missing values
论文作者
论文摘要
随机梯度算法是许多机器学习方法的关键要素,尤其是适合大规模学习。但是,大型数据的主要警告是它们的不完整。我们提出了平均平均的随机梯度算法在线性模型中处理缺失值。这种方法具有不需要任何数据分布建模的功能,并考虑了异质缺失的比例。在流式和有限样本的设置中,我们证明该算法达到了$ \ Mathcal {o}(O}(\ frac {1}} {n})$的融合率,而没有否则为$ n $ n $ n $ n $ n $ n os ans usecors us ans usecor $ n us ans usecor。我们不仅在综合数据上,还包括从医疗寄存器收集的数据集,还显示了算法的收敛行为和相关性。
Stochastic gradient algorithm is a key ingredient of many machine learning methods, particularly appropriate for large-scale learning.However, a major caveat of large data is their incompleteness.We propose an averaged stochastic gradient algorithm handling missing values in linear models. This approach has the merit to be free from the need of any data distribution modeling and to account for heterogeneous missing proportion.In both streaming and finite-sample settings, we prove that this algorithm achieves convergence rate of $\mathcal{O}(\frac{1}{n})$ at the iteration $n$, the same as without missing values. We show the convergence behavior and the relevance of the algorithm not only on synthetic data but also on real data sets, including those collected from medical register.