Gaussian Process Models for HRTF based 3D Sound Localization
The human ability to localize sound-source direction using just two receivers is a complex process of direction inference from spectral cues of sound arriving at the ears. While these cues can be described using the well-known head-related transfer function (HRTF) concept, it is unclear as to how densely HRTF must be sampled and whether a higher-order representation is employed in localization. We propose a class of binaural sound source localization models to answer these two questions. First, using the sound received by two ears, we derive several binaural features that are invariant to the sound source signal. Second, these are implicitly mapped to a high-dimensional \emph{reproducing kernel Hilbert space} via a Gaussian process regression model for feature-direction tuples. Lastly, the features that are most relevant in the model are found via an efficient forward subset-selection method. Experimental results are shown for HRTFs belonging to the CIPIC database.