Visual Geometry Grounded Novel-View Acoustic Synthesis

Jun 1, 2026·
Jay Polra
,
Dhwanil Chauhan
,
Wenjun Huang
,
Kyle Toth
,
Xianhui Wang
,
Yang Ni
· 0 min read
Abstract
Feed-forward pipeline for spatially accurate binaural audio synthesis at novel viewpoints. Eliminates the Structure-from-Motion dependency via VGGT geometry encoding and cross-attention acoustic retrieval. Outperforms prior SOTA on all four metrics with fewer parameters and 10x faster preprocessing.
Type
Publication
CVPR Workshop 2026