论文标题
rntuple绩效:状态和前景
RNTuple performance: Status and Outlook
论文作者
论文摘要
即将进行的HEP实验,例如在HL-LHC处,预计将至少将生成的数据的体积增加一个数量级。为了保留分析数据涌入的能力,对现代存储硬件和系统的全面开发,例如低延迟宽带NVME设备和分布式对象存储,变得至关重要。为此,根Rntuple I/O子系统旨在解决root ttree I/O子系统当前状态的性能瓶颈和缺点。 RNTUPLE提供了TTREE二进制格式的向后不兼容的重新设计,并访问API,该格式将root事件数据I/O进发,以应对即将到来的几十年的挑战。它专注于紧凑的数据格式,现代存储硬件的性能工程,例如,默认情况下通过并行和异步I/O调用,以及易于正确使用的强大接口。在此贡献中,我们评估了典型HEP分析任务的Rntuple性能。我们将Rntuple传递的吞吐量与HEP外的流行I/O库(例如HDF5和Apache Parquet)进行了比较。我们证明了Rntuple在HEP分析工作流程中的优势,并为其在生产中使用的道路提供了前景。
Upcoming HEP experiments, e.g. at the HL-LHC, are expected to increase the volume of generated data by at least one order of magnitude. In order to retain the ability to analyze the influx of data, full exploitation of modern storage hardware and systems, such as low-latency high-bandwidth NVMe devices and distributed object stores, becomes critical. To this end, the ROOT RNTuple I/O subsystem has been designed to address performance bottlenecks and shortcomings of ROOT's current state of the art TTree I/O subsystem. RNTuple provides a backwards-incompatible redesign of the TTree binary format and access API that evolves the ROOT event data I/O for the challenges of the upcoming decades. It focuses on a compact data format, on performance engineering for modern storage hardware, for instance through making parallel and asynchronous I/O calls by default, and on robust interfaces that are easy to use correctly. In this contribution, we evaluate the RNTuple performance for typical HEP analysis tasks. We compare the throughput delivered by RNTuple to popular I/O libraries outside HEP, such as HDF5 and Apache Parquet. We demonstrate the advantages of RNTuple for HEP analysis workflows and provide an outlook on the road to its use in production.