Publications on Transposition-Invariant Interval Representations at the ISMIR Conference 2018
Many music-theoretical constructs (such as scale types, modes, cadences, and chord types) are defined in terms of pitch intervals—relative distances between pitches. Therefore, when computer models are employed in music tasks, it can be useful to operate on interval representations rather than on the raw musical surface. Moreover, interval representations are transposition-invariant, valuable for tasks like audio alignment, cover song detection, and music structure analysis. Most currently used neural network architectures are unable to transform music into interval representations implicitly.
The papers published in the ISMIR 2018 conference introduce neural network models that can learn musical intervals from polyphonic music in the symbolic domain (i.e., musical notes) and in audio. To this end, an unsupervised training method is proposed yielding an organization of intervals in the representation space which is musically plausible. Based on the representations, a transposition-invariant self-similarity matrix is constructed and used to determine repeated sections in symbolic music and audio, yielding competitive results in the MIREX task “Discovery of Repeated Themes and Sections”. Furthermore, the proposed models are used to predict musical sequences. They show better performance in terms of prediction accuracy, and remarkably, they are also able to learn musical self-similarity structure—a capability which is challenging to learn by common neural networks. We could improve the state of the art for general connectionist sequence models in learning to predict monophonic melodies, and ensembles of relative and absolute music processing models improve the results appreciably.
Authors: Stefan Lattner, Andreas Arzt, Maarten Grachten, Gerhard Widmer