论文标题
私人和公共免疫曲目序列的数学表征
Mathematical Characterization of Private and Public Immune Repertoire Sequences
论文作者
论文摘要
各种T和B细胞库在对各种病原体和恶性细胞的有效免疫反应中发挥着重要作用。独特的T和B细胞克隆的数量分别为T和B细胞受体(TCR和BCR)。尽管受体序列是通过重组过程概率地产生的,但临床研究发现不同个体之间TCR和BCR的共享程度很高。在这项工作中,我们对T/B细胞受体克隆丰度的一般概率模型定义了“公共性”或“私有性”和信息理论措施,以比较在不同个体中观察到的采样序列的频率。我们得出数学公式来量化克隆丰富度和重叠的平均值和方差。我们的结果可用于评估不同采样方案对个体内部克隆的丰度以及跨个体克隆的共同点的影响。使用合成和经验TCR氨基酸序列数据,我们执行模拟来研究多个个体的克隆共同点。根据我们的公式,我们将这些模拟结果与曲目重叠的分析预测的平均值和方差进行了比较。与模拟曲目的结果相辅相成,我们为特定的单参数截短的幂律概率分布提供了显式表达式及其不确定性。最后,还评估了与频谱型中所做的那样,与将某些受体序列分组相关的信息损失也得到了评估。原则上,我们的方法可以应用于更通用和机械上现实的克隆生成模型。
Diverse T and B cell repertoires play an important role in mounting effective immune responses against a wide range of pathogens and malignant cells. The number of unique T and B cell clones is characterized by T and B cell receptors (TCRs and BCRs), respectively. Although receptor sequences are generated probabilistically by recombination processes, clinical studies found a high degree of sharing of TCRs and BCRs among different individuals. In this work, we use a general probabilistic model for T/B cell receptor clone abundances to define "publicness" or "privateness" and information-theoretic measures for comparing the frequency of sampled sequences observed across different individuals. We derive mathematical formulae to quantify the mean and the variances of clone richness and overlap. Our results can be used to evaluate the effect of different sampling protocols on abundances of clones within an individual as well as the commonality of clones across individuals. Using synthetic and empirical TCR amino acid sequence data, we perform simulations to study expected clonal commonalities across multiple individuals. Based on our formulae, we compare these simulated results with the analytically predicted mean and variances of the repertoire overlap. Complementing the results on simulated repertoires, we derive explicit expressions for the richness and its uncertainty for specific, single-parameter truncated power-law probability distributions. Finally, the information loss associated with grouping together certain receptor sequences, as is done in spectratyping, is also evaluated. Our approach can be, in principle, applied under more general and mechanistically realistic clone generation models.