通过Wi-Fi信号的人类轮廓和骨架视频综合

论文标题

通过Wi-Fi信号的人类轮廓和骨架视频综合

Human Silhouette and Skeleton Video Synthesis through Wi-Fi signals

论文作者

Avola, Danilo, Cascio, Marco, Cinque, Luigi, Fagioli, Alessio, Foresti, Gian Luca

论文摘要

无线访问点（AP）的可用性提高正在导致基于Wi-Fi信号作为广泛视觉传感器的支持或替代工具的人类传感应用，在该信号中，信号可以解决与众所周知的视觉相关问题（例如照明变化或遮挡）。实际上，使用图像合成技术将无线电频率转换为可见光谱对于获得其他不可用的视觉数据至关重要。这种域对域的翻译是可行的，因为物体和人都会影响电磁波，从而导致无线电和光学频率变化。在文献中，由于可以通过Wi-Fi AP的通道状态信息（CSI）观察到频率变化，因此在过去的几年中，能够推断无线电映射的模型在过去几年中取得了动力，从而启用了基于信号的特征提取，例如，例如振幅。因此，本文介绍了一种新型的两分支生成神经网络，该网络有效地将无线电数据映射到视觉特征中，遵循师生设计的设计，该设计利用了跨模式监督策略。后一种情况下，视觉域中的基于信号的特征可以完全替换视觉数据。一旦受过培训，该建议的方法将使用专门的Wi-Fi信号合成人类轮廓和骨骼视频。该方法对公开数据进行了评估，在该数据中，它为轮廓和骨架视频生成获得了显着的结果，这证明了拟议的跨模式监督策略的有效性。

The increasing availability of wireless access points (APs) is leading towards human sensing applications based on Wi-Fi signals as support or alternative tools to the widespread visual sensors, where the signals enable to address well-known vision-related problems such as illumination changes or occlusions. Indeed, using image synthesis techniques to translate radio frequencies to the visible spectrum can become essential to obtain otherwise unavailable visual data. This domain-to-domain translation is feasible because both objects and people affect electromagnetic waves, causing radio and optical frequencies variations. In literature, models capable of inferring radio-to-visual features mappings have gained momentum in the last few years since frequency changes can be observed in the radio domain through the channel state information (CSI) of Wi-Fi APs, enabling signal-based feature extraction, e.g., amplitude. On this account, this paper presents a novel two-branch generative neural network that effectively maps radio data into visual features, following a teacher-student design that exploits a cross-modality supervision strategy. The latter conditions signal-based features in the visual domain to completely replace visual data. Once trained, the proposed method synthesizes human silhouette and skeleton videos using exclusively Wi-Fi signals. The approach is evaluated on publicly available data, where it obtains remarkable results for both silhouette and skeleton videos generation, demonstrating the effectiveness of the proposed cross-modality supervision strategy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题