论文标题
用$ p $ - 价值P-P图描述主观实验一致性
Describing Subjective Experiment Consistency by $p$-Value P-P Plot
论文作者
论文摘要
如果没有主观测试,就无法测量一些现象。但是,主观测试是许多影响因素的复杂问题。这些相互作用以产生精确或不正确的结果。研究人员需要一种工具将主观实验的结果分类为一致或不一致的工具。这是为了决定是否将收集的分数视为质量地面真相数据是必要的。知道是否可以信任主观分数是根据这些分数得出有效结论和构建功能工具的关键(例如,评估多媒体材料感知质量的算法)。我们提供了一种工具,将主观实验(及其结果)分类为一致或不一致的工具。此外,该工具确定了具有不规则得分分布的刺激。该方法基于将主观得分视为来自离散的广义分布(GSD)的随机变量。 GSD结合了拟合优度的引导G-Test,允许构建$ p $ - 值P-P图,以可视化实验的一致性。该工具保护人员不使用不一致的主观数据。这样,它可以确保他们得出的结论和构建的工具更加精确和值得信赖。所提出的方法符合预期,仅在实验设计描述中描述了21种真实的多媒体质量主观实验。
There are phenomena that cannot be measured without subjective testing. However, subjective testing is a complex issue with many influencing factors. These interplay to yield either precise or incorrect results. Researchers require a tool to classify results of subjective experiment as either consistent or inconsistent. This is necessary in order to decide whether to treat the gathered scores as quality ground truth data. Knowing if subjective scores can be trusted is key to drawing valid conclusions and building functional tools based on those scores (e.g., algorithms assessing the perceived quality of multimedia materials). We provide a tool to classify subjective experiment (and all its results) as either consistent or inconsistent. Additionally, the tool identifies stimuli having irregular score distribution. The approach is based on treating subjective scores as a random variable coming from the discrete Generalized Score Distribution (GSD). The GSD, in combination with a bootstrapped G-test of goodness-of-fit, allows to construct $p$-value P-P plot that visualizes experiment's consistency. The tool safeguards researchers from using inconsistent subjective data. In this way, it makes sure that conclusions they draw and tools they build are more precise and trustworthy. The proposed approach works in line with expectations drawn solely on experiment design descriptions of 21 real-life multimedia quality subjective experiments.