论文标题
迈向自动课堂观察:多模式机器学习以估计班级积极气候和负面气候
Toward Automated Classroom Observation: Multimodal Machine Learning to Estimate CLASS Positive Climate and Negative Climate
论文作者
论文摘要
在这项工作中,我们介绍了一个我们称为Acorn的多模式基于机器学习的系统,以分析学校教室的视频,以确保积极气候(PC)和负面气候(NC)的班级观察协议,并在教育研究中广泛使用。 Acorn使用卷积神经网络来分析光谱音频功能,教师和学生的面孔以及每个图像框架的像素,然后使用时间卷积网络随着时间的推移整合了此信息。视听橡子的PC和NC预测的Pearson相关性为$ 0.55 $和$ 0.63 $与专家类编码人员在UVA Toddler DataSet上提供的地面真实分数($ n = 300 $ 15分钟的视频片段的交叉验证),以及一个纯净的Acorn oudatory Acorn $ nc $ 0.36的$ 0.36666666666666666666666666666.3666666666666666.3666666666的$0.366666666666666666666666。 ($ n = 2000 $视频段的测试集)。这些数字类似于人类编码人员的编码器间可靠性。最后,使用图形卷积网络,我们在PC特别弱/强弱时,在预测特定矩(45-90秒剪辑)方面进行早期步伐(AUC = 0.70美元)。我们的发现为自动教室观察以及更一般的视频活动识别和摘要识别系统的设计提供了信息。
In this work we present a multi-modal machine learning-based system, which we call ACORN, to analyze videos of school classrooms for the Positive Climate (PC) and Negative Climate (NC) dimensions of the CLASS observation protocol that is widely used in educational research. ACORN uses convolutional neural networks to analyze spectral audio features, the faces of teachers and students, and the pixels of each image frame, and then integrates this information over time using Temporal Convolutional Networks. The audiovisual ACORN's PC and NC predictions have Pearson correlations of $0.55$ and $0.63$ with ground-truth scores provided by expert CLASS coders on the UVA Toddler dataset (cross-validation on $n=300$ 15-min video segments), and a purely auditory ACORN predicts PC and NC with correlations of $0.36$ and $0.41$ on the MET dataset (test set of $n=2000$ videos segments). These numbers are similar to inter-coder reliability of human coders. Finally, using Graph Convolutional Networks we make early strides (AUC=$0.70$) toward predicting the specific moments (45-90sec clips) when the PC is particularly weak/strong. Our findings inform the design of automatic classroom observation and also more general video activity recognition and summary recognition systems.