论文标题
泻湖:开源社区的分析工具
LAGOON: An Analysis Tool for Open Source Communities
论文作者
论文摘要
本文介绍了泻湖 - 一个开源平台,用于了解开源软件(OSS)社区的复杂生态系统。该平台当前利用时空图来存储和调查这些社区生产的工件,并帮助分析师确定可能损害OSS项目安全性的坏演员。 Lagoon从几个常见来源提供了大量工件,包括源代码存储库,问题跟踪器,邮件列表和从项目网站上刮擦内容。 Intestion使用模块化体系结构,该模块化体系结构支持数据源中的增量更新,并提供了一个通用的身份融合过程,可以在不同的帐户中识别相同的社区成员。提供了用户界面,以可视化和探索OSS项目的完整社会技术图。提供脚本用于应用机器学习以识别数据中的模式。尽管目前的重点是在Python社区中识别不良行为者,但该平台的可重复使用性使其可以通过新数据和分析轻松扩展,这为Lagoon成为评估各种基于OSS的项目及其社区的全面手段铺平了道路。
This paper presents LAGOON -- an open source platform for understanding the complex ecosystems of Open Source Software (OSS) communities. The platform currently utilizes spatiotemporal graphs to store and investigate the artifacts produced by these communities, and help analysts identify bad actors who might compromise an OSS project's security. LAGOON provides ingest of artifacts from several common sources, including source code repositories, issue trackers, mailing lists and scraping content from project websites. Ingestion utilizes a modular architecture, which supports incremental updates from data sources and provides a generic identity fusion process that can recognize the same community members across disparate accounts. A user interface is provided for visualization and exploration of an OSS project's complete sociotechnical graph. Scripts are provided for applying machine learning to identify patterns within the data. While current focus is on the identification of bad actors in the Python community, the platform's reusability makes it easily extensible with new data and analyses, paving the way for LAGOON to become a comprehensive means of assessing various OSS-based projects and their communities.