Aucouturier, J.-J, Defreville, B., Pachet, F. The bag-of-frame approach to audio pattern recognition: A sufficient model for urban soundscapes but not for polyphonic music. Journal of the Acoustical Society of America, 122(2):881-891, 2007.

Sony CSL authors: Jean-Julien Aucouturier, François Pachet

Abstract

The “bag of frames” approach (BOF) to audio pattern recognition represents signals as the long- term statistical distribution of their local spectral features. This approach has proved nearly optimal for simulating the auditory perception of natural and human environments (or sound- scapes), and is also the most predominent paradigm to extract high-level descriptions from music signals. However, recent studies show that, contrary to its application to soundscape signals, BOF only provides limited performance when applied to polyphonic music signals. This paper proposes to explicitely examine the difference between urban soundscapes and polyphonic mu- sic with respect to their modelling with the BOF approach. First, the application of the same measure of acoustic similarity on both soundscape and music datasets confirms that the BOF approach can model soundscapes to near-perfect precision, and exhibits none of the limitations observed in the music dataset. Second, the modification of this measure by 2 custom homogeneity transforms reveals critical differences in the temporal and statistical structure of the typical frame distribution of each type of signals. Such differences may explain the uneven performance of BOF algorithms on soundscapes and music signals, and suggest that their human perception rely on cognitive processes of a different nature.

Keywords: timbre

Downloads

[PDF] Adobe Acrobat PDF file

BibTeX entry

@ARTICLE { aucouturier:07b, AUTHOR="Aucouturier, J.-J, Defreville, B., Pachet, F.", JOURNAL="Journal of the Acoustical Society of America", NUMBER="2", PAGES="881-891", TITLE="The bag-of-frame approach to audio pattern recognition: A sufficient model for urban soundscapes but not for polyphonic music", VOLUME="122", YEAR="2007", }