论文标题

FastJet的面向阵列的Python接口

An array-oriented Python interface for FastJet

论文作者

Roy, Aryan, Pivarski, Jim, Freer, Chad Wells

论文摘要

对HEP数据的分析是一个迭代过程,其中一个步骤的结果经常告知下一个。在探索性分析中,通常在事件集合上执行一个计算,然后查看结果(通常使用直方图)来决定接下来要尝试什么。 Awkward Array is a Scikit-HEP Python package that enables data analysis with array-at-a-time operations to implement cuts as slices, combinatorics as composable functions, etc. However, most C++ HEP libraries, such as FastJet, have an imperative, one-particle-at-a-time interface, which would be inefficient in Python and goes against the grain of the array-at-a-time logic of scientific Python。因此,我们开发了FastJet,这是一种可容纳PIP的Python软件包,可提供FastJet C ++二进制文件,经典(AT-AT-AT-AT-AT-AT-AT-AT-AT-AT-AT-AT-AT-AT-AT-AT-AT-AT-AT-AT-AT-AT-AT-AT-AT-AT-AT-AT-AT-AT-AT-AT-AT)接口,以及新的面向阵列的界面,可与笨拙的阵列一起使用。 新的界面简化了与HEP之外的科学Python软件相互操作性,例如机器学习。在一种情况下,采用此库以及其他面向阵列的工具加速了HEP分析代码20倍。它旨在与Scikit-Hep生态系统中的库轻松集成,包括隆隆性(File I/O),Hist(Histogramegrame),hist(直方图),矢量(lorentz vectors)(Lorentz vectors)和咖啡因(Lorentz Vectors)和高级GLUE(高级Glue)。我们讨论了FastJet Python库的设计,将经典界面与阵列的界面以及用于Lorentz Vector操作的Vector库集成在一起。新界面是开源的。

Analysis on HEP data is an iterative process in which the results of one step often inform the next. In an exploratory analysis, it is common to perform one computation on a collection of events, then view the results (often with histograms) to decide what to try next. Awkward Array is a Scikit-HEP Python package that enables data analysis with array-at-a-time operations to implement cuts as slices, combinatorics as composable functions, etc. However, most C++ HEP libraries, such as FastJet, have an imperative, one-particle-at-a-time interface, which would be inefficient in Python and goes against the grain of the array-at-a-time logic of scientific Python. Therefore, we developed fastjet, a pip-installable Python package that provides FastJet C++ binaries, the classic (particle-at-a-time) Python interface, and the new array-oriented interface for use with Awkward Array. The new interface streamlines interoperability with scientific Python software beyond HEP, such as machine learning. In one case, adopting this library along with other array-oriented tools accelerated HEP analysis code by a factor of 20. It was designed to be easily integrated with libraries in the Scikit-HEP ecosystem, including Uproot (file I/O), hist (histogramming), Vector (Lorentz vectors), and Coffea (high-level glue). We discuss the design of the fastjet Python library, integrating the classic interface with the array oriented interface and with the Vector library for Lorentz vector operations. The new interface was developed as open source.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源