论文标题
在实践中探索ML测试 - 从轴线通信的互动快速审查中学到的经验教训
Exploring ML testing in practice -- Lessons learned from an interactive rapid review with Axis Communications
论文作者
论文摘要
对机器学习(ML)测试的行业和学术界越来越兴趣。我们认为,行业和学术界需要共同学习以产生严格和相关的知识。在这项研究中,我们启动了一家案例公司,一所研究所和一所大学的利益相关者之间的合作。为了建立问题域的共同观点,我们对艺术状态进行了互动快速审查。隆德大学和里斯研究机构的四名研究人员和Axis Communications的四名从业人员回顾了一组关于ML测试的主要研究。我们开发了围绕ML测试挑战和结果的沟通的分类法,并确定了与Axis通信相关的12个评论问题的列表。将三个最重要的问题(数据测试,评估指标和测试产生)映射到文献,对35项基本研究的深入分析与最重要的问题(数据测试)进行了深入分析。分析了这五场最佳比赛的最后一组,我们反思了针对该行业的适用性和相关性的标准。分类法对沟通有帮助,但没有最终的沟通。此外,与案件公司的调查审查问题(数据测试)没有完美的匹配。但是,我们从概念层面的五项研究中提取了相关方法,以支持以后的特定环境改进。我们发现交互式快速审查方法有助于触发和调整不同利益相关者之间的沟通。
There is a growing interest in industry and academia in machine learning (ML) testing. We believe that industry and academia need to learn together to produce rigorous and relevant knowledge. In this study, we initiate a collaboration between stakeholders from one case company, one research institute, and one university. To establish a common view of the problem domain, we applied an interactive rapid review of the state of the art. Four researchers from Lund University and RISE Research Institutes and four practitioners from Axis Communications reviewed a set of 180 primary studies on ML testing. We developed a taxonomy for the communication around ML testing challenges and results and identified a list of 12 review questions relevant for Axis Communications. The three most important questions (data testing, metrics for assessment, and test generation) were mapped to the literature, and an in-depth analysis of the 35 primary studies matching the most important question (data testing) was made. A final set of the five best matches were analysed and we reflect on the criteria for applicability and relevance for the industry. The taxonomies are helpful for communication but not final. Furthermore, there was no perfect match to the case company's investigated review question (data testing). However, we extracted relevant approaches from the five studies on a conceptual level to support later context-specific improvements. We found the interactive rapid review approach useful for triggering and aligning communication between the different stakeholders.