论文标题
利用人工智能对二进制代码理解
Leveraging Artificial Intelligence on Binary Code Comprehension
论文作者
论文摘要
了解二进制代码是一项必不可少但复杂的软件工程任务,用于反向工程,恶意软件分析和编译器优化。与源代码不同,二进制代码的语义信息有限,这使其对人类的理解充满挑战。同时,将二进制代码源编译为二进制代码,或在不同的编程语言(PLS)之间进行转换,可以提供一种将外部知识引入二进制理解的方法。我们建议开发人工智能(AI)模型,以帮助人类对二元法规的理解。具体而言,我们建议将大型源代码(例如,变量名,注释)中的域知识合并,以构建捕获二进制代码的可推广表示的AI模型。最后,我们将研究指标,以评估使用人类理解研究适用于二进制代码的模型的性能。
Understanding binary code is an essential but complex software engineering task for reverse engineering, malware analysis, and compiler optimization. Unlike source code, binary code has limited semantic information, which makes it challenging for human comprehension. At the same time, compiling source to binary code, or transpiling among different programming languages (PLs) can provide a way to introduce external knowledge into binary comprehension. We propose to develop Artificial Intelligence (AI) models that aid human comprehension of binary code. Specifically, we propose to incorporate domain knowledge from large corpora of source code (e.g., variable names, comments) to build AI models that capture a generalizable representation of binary code. Lastly, we will investigate metrics to assess the performance of models that apply to binary code by using human studies of comprehension.