论文标题
使用伪F-RATIO移动窗口($ψ$ frmv),为GC-MS数据的未靶向区域选择的区域选择区域
Untargeted Region of Interest Selection for GC-MS Data using a Pseudo F-Ratio Moving Window ($ψ$FRMV)
论文作者
论文摘要
与分析气相色谱法 - 质谱(GC -MS)数据相关的挑战很多。这些挑战中的许多挑战源于以下事实:由于高度的分裂程度,并且伴随分子离子信号丧失,因此电子离子化可能使得难以恢复分子信息。借助GC-MS数据,通常在密切洗脱峰之间共享许多常见的片段离子,因此需要进行复杂的分析方法。其中一些方法是完全自动化的,但是对数据可以在分析过程中引入伪影的数据做出了一些假设。化学计量方法(例如多元曲线分辨率或平行因子分析)特别有吸引力,因为它们是灵活的,并且对数据的假设相对较少 - 理想情况下会导致伪像更少。这些方法确实需要专家用户干预来确定每个区域最相关的关注区域和适当数量的组件,即$ k $。需要选择自动化区域,以允许使用高级信号反卷积的色谱数据进行自动批处理处理。在这里,我们提出了一种新方法,用于自动化的,不受限制的感兴趣的区域选择,该方法是基于首先平方的比率和第二个奇异值分解的窗口中移动的奇异值分解的GC-MS数据中存在的多变量信息,以选择兴趣区域。假设第一个奇异值主要解释了信号,而第二个奇异值主要解释了噪声,则可以将这两个值之间的关系解释为Fisher比率的概率分布。通过研究该算法不再挑选已知包含信号的色谱区的浓度来测试算法的灵敏度。
There are many challenges associated with analysing gas chromatography - mass spectrometry (GC-MS) data. Many of these challenges stem from the fact that electron ionisation can make it difficult to recover molecular information due to the high degree of fragmentation with concomitant loss of molecular ion signal. With GC-MS data there are often many common fragment ions shared among closely-eluting peaks, necessitating sophisticated methods for analysis. Some of these methods are fully automated, but make some assumptions about the data which can introduce artifacts during the analysis. Chemometric methods such as Multivariate Curve Resolution, or Parallel Factor Analysis are particularly attractive, since they are flexible and make relatively few assumptions about the data - ideally resulting in fewer artifacts. These methods do require expert user intervention to determine the most relevant regions of interest and an appropriate number of components, $k$, for each region. Automated region of interest selection is needed to permit automated batch processing of chromatographic data with advanced signal deconvolution. Here, we propose a new method for automated, untargeted region of interest selection that accounts for the multivariate information present in GC-MS data to select regions of interest based on the ratio of the squared first, and second singular values from the Singular Value Decomposition of a window that moves across the chromatogram. Assuming that the first singular value accounts largely for signal, and that the second singular value accounts largely for noise, it is possible to interpret the relationship between these two values as a probabilistic distribution of Fisher Ratios. The sensitivity of the algorithm was tested by investigating the concentration at which the algorithm can no longer pick out chromatographic regions known to contain signal.