体系结构和培训方法对学习视频框架预测的性能的影响

论文标题

体系结构和培训方法对学习视频框架预测的性能的影响

Effect of Architectures and Training Methods on the Performance of Learned Video Frame Prediction

论文作者

Yilmaz, M. Akin, Tekalp, A. Murat

论文摘要

我们分析了前馈与复发性神经网络（RNN）架构的性能以及用于学习框架预测的相关培训方法。为此，我们培训了使用均方根损耗的剩余完全卷积神经网络（FCNN），卷积RNN（CRNN）和一个卷积长的短期记忆（CLSTM）网络，用于下一帧预测。我们对经常性网络进行了无状态和状态培训。实验结果表明，残留的FCNN体系结构在峰信号与噪声比（PSNR）方面表现最好，而牺牲了更高的训练和测试（推理）计算复杂性。可以使用状态截断的反向传播通过时间过程对CRNN进行稳定且非常有效的训练，并且需要较小的推理运行时订单，才能通过可接受的性能实现接近实时的框架预测。

We analyze the performance of feedforward vs. recurrent neural network (RNN) architectures and associated training methods for learned frame prediction. To this effect, we trained a residual fully convolutional neural network (FCNN), a convolutional RNN (CRNN), and a convolutional long short-term memory (CLSTM) network for next frame prediction using the mean square loss. We performed both stateless and stateful training for recurrent networks. Experimental results show that the residual FCNN architecture performs the best in terms of peak signal to noise ratio (PSNR) at the expense of higher training and test (inference) computational complexity. The CRNN can be trained stably and very efficiently using the stateful truncated backpropagation through time procedure, and it requires an order of magnitude less inference runtime to achieve near real-time frame prediction with an acceptable performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题