论文标题

通过综合到实际转移学习,通过解剖眼睛区域隔离进行多发式目光估计

Multistream Gaze Estimation with Anatomical Eye Region Isolation by Synthetic to Real Transfer Learning

论文作者

Mahmud, Zunayed, Hungler, Paul, Etemad, Ali

论文摘要

我们提出了一条新型的神经管道Msgazenet,该管道通过通过多发达框架利用眼睛解剖学信息来学习目光的表示。我们提出的解决方案包括两个组件,首先是一个用于隔离解剖眼睛区域的网络,以及用于多发达凝视估计的第二个网络。眼睛区域的隔离是通过U-NET样式网络进行的,我们使用合成数据集训练该网络,该数据集包含可见眼球和虹膜区域的眼睛区域掩模。在此阶段使用的合成数据集使用Unityeyes模拟器购买,并由80,000张眼睛图像组成。然后将眼睛区域隔离网络连续地转移到真实域,以生成现实世界中的眼睛图像的口罩。为了成功进行转移,我们在训练过程中利用域随机化,这允许合成图像从更大的差异中受益,并在类似于伪影的增强的帮助下从更大的差异中受益。然后,生成的眼睛区域掩模以及原始的眼睛图像被一起用作我们目光估计网络的多式输入,该输入由宽的残留块组成。这些编码器的输出嵌入在进出注视回归层之前将其融合在通道维度中。我们在三个凝视估计数据集上评估我们的框架并取得了强大的性能。我们的方法在两个数据集上超过了7.57%和1.85%的最新方法,另一方面获得了竞争成果。我们还研究了方法在数据中的噪声方面的鲁棒性,并证明我们的模型对噪声数据不太敏感。最后,我们执行各种实验,包括消融研究,以评估解决方案中不同组件和设计选择的贡献。

We propose a novel neural pipeline, MSGazeNet, that learns gaze representations by taking advantage of the eye anatomy information through a multistream framework. Our proposed solution comprises two components, first a network for isolating anatomical eye regions, and a second network for multistream gaze estimation. The eye region isolation is performed with a U-Net style network which we train using a synthetic dataset that contains eye region masks for the visible eyeball and the iris region. The synthetic dataset used in this stage is procured using the UnityEyes simulator, and consists of 80,000 eye images. Successive to training, the eye region isolation network is then transferred to the real domain for generating masks for the real-world eye images. In order to successfully make the transfer, we exploit domain randomization in the training process, which allows for the synthetic images to benefit from a larger variance with the help of augmentations that resemble artifacts. The generated eye region masks along with the raw eye images are then used together as a multistream input to our gaze estimation network, which consists of wide residual blocks. The output embeddings from these encoders are fused in the channel dimension before feeding into the gaze regression layers. We evaluate our framework on three gaze estimation datasets and achieve strong performances. Our method surpasses the state-of-the-art by 7.57% and 1.85% on two datasets, and obtains competitive results on the other. We also study the robustness of our method with respect to the noise in the data and demonstrate that our model is less sensitive to noisy data. Lastly, we perform a variety of experiments including ablation studies to evaluate the contribution of different components and design choices in our solution.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源