软件分析的更简单的超参数优化：为什么，如何，何时？

论文标题

软件分析的更简单的超参数优化：为什么，如何，何时？

Simpler Hyperparameter Optimization for Software Analytics: Why, How, When?

论文作者

Agrawal, Amritanshu, Yang, Xueqi, Agrawal, Rishabh, Shen, Xipeng, Menzies, Tim

论文摘要

如何使软件分析更简单，更快？一种方法是将分析的复杂性与所探索数据的内在复杂性匹配。例如，高参数优化者找到了数据矿工的控制设置，以改善通过软件分析生成的预测。有时，只需躲避以前尝试的事情就可以实现非常快的超参数优化。但是，什么时候使用躲避是明智的，何时必须使用更复杂（速度较慢）的优化器？为了回答这一点，我们将超参数优化应用于探索不良气味检测的120个SE数据集，预测GitHub SSUE关闭时间，错误报告分析，缺陷预测以及其他数十个非SE问题。我们发现，道奇最适合具有低“固有维度”（d = 3）的数据集（d = 3），对于更高维度的数据（d超过8）。这里看到的几乎所有SE数据本质上都是低维的，这表明道奇适用于许多SE分析任务。

How to make software analytics simpler and faster? One method is to match the complexity of analysis to the intrinsic complexity of the data being explored. For example, hyperparameter optimizers find the control settings for data miners that improve for improving the predictions generated via software analytics. Sometimes, very fast hyperparameter optimization can be achieved by just DODGE-ing away from things tried before. But when is it wise to use DODGE and when must we use more complex (and much slower) optimizers? To answer this, we applied hyperparameter optimization to 120 SE data sets that explored bad smell detection, predicting Github ssue close time, bug report analysis, defect prediction, and dozens of other non-SE problems. We find that DODGE works best for data sets with low "intrinsic dimensionality" (D = 3) and very poorly for higher-dimensional data (D over 8). Nearly all the SE data seen here was intrinsically low-dimensional, indicating that DODGE is applicable for many SE analytics tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题