论文标题

概述了11项提案,以建立安全的高级AI

An overview of 11 proposals for building safe advanced AI

论文作者

Hubinger, Evan

论文摘要

本文分析并比较了在当前机器学习范式下建立安全高级AI的11种不同建议,包括主要的竞争者,例如迭代的放大,通过辩论和递归奖励建模。对每个提案进行评估,以外对齐,内部对齐,训练竞争力和性能竞争力的四个组成部分进行评估,本文引入了后两者之间的区别。虽然先前的文献主要集中于分析单个建议,或者主要集中于外部对齐方式,但以内部对齐方式为代价,但该分析试图对广泛的建议进行比较,包括前面提到的所有四个组件的比较分析,包括比较分析。

This paper analyzes and compares 11 different proposals for building safe advanced AI under the current machine learning paradigm, including major contenders such as iterated amplification, AI safety via debate, and recursive reward modeling. Each proposal is evaluated on the four components of outer alignment, inner alignment, training competitiveness, and performance competitiveness, of which the distinction between the latter two is introduced in this paper. While prior literature has primarily focused on analyzing individual proposals, or primarily focused on outer alignment at the expense of inner alignment, this analysis seeks to take a comparative look at a wide range of proposals including a comparative analysis across all four previously mentioned components.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源