二进制早期筛选网络，用于自适应推断低资源设备

论文标题

二进制早期筛选网络，用于自适应推断低资源设备

Binary Early-Exit Network for Adaptive Inference on Low-Resource Devices

论文作者

Saeed, Aaqib

论文摘要

深度神经网络在一系列任务上的性能大大提高，对计算资源的需求不断增长，从而使低资源设备（内存和电池电量有限）的部署不可行。与实价模型相比，二进制神经网络（BNNS）在极端的压缩和加速增长方面解决了问题。我们提出了一种简单但有效的方法，以通过统一BNN和早期验证策略来加速推理。我们的方法允许简单实例根据决策阈值尽早退出，并利用添加到不同中间层的输出层以避免执行整个二进制模型。我们对三个音频分类任务以及四个BNN架构进行了广泛评估我们的方法。我们的方法表明了有利的质量效率权衡，同时可以通过系统用户指定的基于熵的阈值来控制。它还基于现有的BNN体系结构而无需进行不同效率水平的单个模型，从而使速度更高（低于6ms）具有更高的速度（延迟小于6ms）。它还提供了一种直接的方法来估计样本难度和对数据集中某些类别周围不确定性的更好理解。

Deep neural networks have significantly improved performance on a range of tasks with the increasing demand for computational resources, leaving deployment on low-resource devices (with limited memory and battery power) infeasible. Binary neural networks (BNNs) tackle the issue to an extent with extreme compression and speed-up gains compared to real-valued models. We propose a simple but effective method to accelerate inference through unifying BNNs with an early-exiting strategy. Our approach allows simple instances to exit early based on a decision threshold and utilizes output layers added to different intermediate layers to avoid executing the entire binary model. We extensively evaluate our method on three audio classification tasks and across four BNNs architectures. Our method demonstrates favorable quality-efficiency trade-offs while being controllable with an entropy-based threshold specified by the system user. It also results in better speed-ups (latency less than 6ms) with a single model based on existing BNN architectures without retraining for different efficiency levels. It also provides a straightforward way to estimate sample difficulty and better understanding of uncertainty around certain classes within the dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题