论文标题

Java分解器的多样性及其在元编译中的应用

Java Decompiler Diversity and its Application to Meta-decompilation

论文作者

Harrand, Nicolas, Soto-Valero, César, Monperrus, Martin, Baudry, Benoit

论文摘要

在从Java源代码到字节码的汇编过程中,某些信息会不可逆转地丢失。换句话说,Java代码的汇编和解码不是对称的。因此,旨在从字节码中生成源代码的解译本依赖于重建丢失的信息的策略。不同的Java分解器使用不同的策略来实现适当的解译本。在这项工作中,我们假设可以将字节码分解的多种方式直接影响分解器产生的源代码的质量。在本文中,我们评估了有关三个质量指标的八个Java分解器的策略:句法正确性,句法失真和语义等效性模量输入。我们的结果表明,没有一个现代的反编译器能够正确处理来自现实世界程序的各种字节码结构。这项研究中排名最高的分解符在我们数据集中的类别中分别为84%的句法正确和语义上的代码输出分别为78%。我们的结果表明,每个反编译器都正确处理了一组不同的字节码类。我们提出了一个称为Arlecchino的新的反编译器,它利用了现有的分解器的多样性。为此,我们根据编译错误将部分分解合并为一个新的解放。 Arlecchino处理以前无译本处理的字节码类中的37.6%。我们发布了此新字节码分解器的来源。

During compilation from Java source code to bytecode, some information is irreversibly lost. In other words, compilation and decompilation of Java code is not symmetric. Consequently, decompilation, which aims at producing source code from bytecode, relies on strategies to reconstruct the information that has been lost. Different Java decompilers use distinct strategies to achieve proper decompilation. In this work, we hypothesize that the diverse ways in which bytecode can be decompiled has a direct impact on the quality of the source code produced by decompilers. In this paper, we assess the strategies of eight Java decompilers with respect to three quality indicators: syntactic correctness, syntactic distortion and semantic equivalence modulo inputs. Our results show that no single modern decompiler is able to correctly handle the variety of bytecode structures coming from real-world programs. The highest ranking decompiler in this study produces syntactically correct, and semantically equivalent code output for 84%, respectively 78%, of the classes in our dataset. Our results demonstrate that each decompiler correctly handles a different set of bytecode classes. We propose a new decompiler called Arlecchino that leverages the diversity of existing decompilers. To do so, we merge partial decompilation into a new one based on compilation errors. Arlecchino handles 37.6% of bytecode classes that were previously handled by no decompiler. We publish the sources of this new bytecode decompiler.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源