在等离子体科学中代表学习的完全卷积时空模型

论文标题

在等离子体科学中代表学习的完全卷积时空模型

Fully Convolutional Spatio-Temporal Models for Representation Learning in Plasma Science

论文作者

Dong, Ge, Felker, Kyle Gerard, Svyatkovskiy, Alexey, Tang, William, Kates-Harbeck, Julian

论文摘要

我们已经培训了一个完全卷积的时空模型，以在融合能量等离子体科学的挑战性示例应用领域中快速准确表示学习。重大干扰的开始是至关重要的融合能源科学（FES）问题，必须解决高级托卡马克（Tokamak）。尽管已使用多种统计方法来解决Tokamak破坏预测和控制的问题，但基于深度学习的最新方法已被证明是特别引人注目的。在本文中，我们将融合复发性神经网络（FRNN）软件套件的进一步改进介绍。到目前为止，FRNN基于复发性神经网络的长期短期记忆（LSTM）变体，以利用数据中的时间信息。在这里，我们将时间卷积神经网络（TCN）体系结构应用于时间有关的输入信号，从而使FRNN架构完全卷积。这允许高度优化的卷积操作承担训练的大部分计算负载，从而可以减少训练时间，并有效利用高性能计算（HPC）资源来进行超参数调整。同时，与大型代表性融合数据库的LSTM体系结构相比，基于TCN的体系结构可实现相等或更好的预测性能。在整个数据丰富的科学学科中，这些结果对基于深度学习的一般时空特征提取器的资源有效培训具有影响。此外，这个具有挑战性的示例案例研究说明了一个预测平台的优势，其灵活的体系结构选择选项能够轻松地调整和适应以响应大型现代观察数据集中越来越多地出现的预测需求。

We have trained a fully convolutional spatio-temporal model for fast and accurate representation learning in the challenging exemplar application area of fusion energy plasma science. The onset of major disruptions is a critically important fusion energy science (FES) issue that must be resolved for advanced tokamak. While a variety of statistical methods have been used to address the problem of tokamak disruption prediction and control, recent approaches based on deep learning have proven particularly compelling. In the present paper, we introduce further improvements to the fusion recurrent neural network (FRNN) software suite. Up to now, FRNN was based on the long short-term memory (LSTM) variant of recurrent neural networks to leverage the temporal information in the data. Here, we implement and apply the temporal convolutional neural network (TCN) architecture to the time-dependent input signals, thus rendering the FRNN architecture fully convolutional. This allows highly optimized convolution operations to carry the majority of the computational load of training, thus enabling a reduction in training time, and the effective use of high performance computing (HPC) resources for hyperparameter tuning. At the same time, the TCN based architecture achieves equal or better predictive performance when compared with the LSTM architecture for a large, representative fusion database. Across data-rich scientific disciplines, these results have implications for the resource-effective training of general spatio-temporal feature extractors based on deep learning. Moreover, this challenging exemplar case study illustrates the advantages of a predictive platform with flexible architecture selection options capable of being readily tuned and adapted for responding to prediction needs that increasingly arise in large modern observational dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题