8
High-resolution efficient image generation from WiFi CSI using a pretrained latent diffusion model
arxiv.orgWe present LatentCSI, a novel method for generating images of the physical environment from WiFi CSI measurements that leverages a pretrained latent diffusion model (LDM). Unlike prior approaches that rely on complex and computationally intensive techniques such as GANs, our method employs a lightweight neural network to map CSI amplitudes directly into the latent space of an LDM. We then apply the LDM's denoising diffusion model to the latent representation with text-based guidance before decoding using the LDM's pretrained decoder to obtain a high-resolution image. This design bypasses the challenges of pixel-space image generation and avoids the explicit image encoding stage typically required in conventional image-to-image pipelines, enabling efficient and high-quality image synthesis. We validate our approach on two datasets: a wide-band CSI dataset we collected with off-the-shelf WiFi devices and cameras; and a subset of the publicly available MM-Fi dataset. The results demonstrate that LatentCSI outperforms baselines of comparable complexity trained directly on ground-truth images in both computational efficiency and perceptual quality, while additionally providing practical advantages through its unique capacity for text-guided controllability.
What I got from it is that it’s not guessing the color. It recreates what it saw on a colorful test sample when it had a similar input - a guy that walked around the room, creating interference. It can as well be a moving barrel of similar proportions and surface characteristics. Changing room geometry or adding another moving body at the same time means it should be retrained from the scratch. So the picture is a total aproximation and a crude one. But what can be done blind, without a camera while learning - is get trained on empty room and tell when there is someone there. What is significant though - if you have some way to timestamp wifi input with a sight from outside of a window, you can match these two and then tell where, aproximately, someone stays without having a good look on the scene.
Hmm that’s an interesting application.