In this talk, I will explain the mechanisms of transformation- and invariance learning for symbolic music and audio, and I will describe different models that are based on this principle.Transformation Learning (TL) provides us with a novel way of musical representation learning. To that end, we do not aim to learn the musical patterns themselves, but some “rules” defining how a given pattern can be transformed into another pattern.
TL was initially proposed for image processing and had not yet been applied to music. In this talk, I summarize our experiments in TL for music. The models used throughout our work are based on Gated Autoencoders (GAE) which learn orthogonal transformations between data pairs. We show that a GAE can learn chromatic transposition, tempo-change, and the retrograde movement in music, but also more complex musical transformations, like diatonic transposition.
Transformation Learning (TL) provides us with a different view on music data, and yields features complementary to other music descriptors (e.g., such as obtained by autoencoder learning or hand-crafted features). There are different possible research directions regarding TL in music. They involve using the transformation features themselves, using transformation-invariant features computed from TL models, and using TL models for music generation.
I will particularly focus on DrumNet, a convolutional variant of a Gated Autoencoder, and will show how TL leads to time and tempo-invariant representations of rhythm. Importantly, learning transformations and learning invariances are two sides of the same coin (as specific invariances are defined with respect to specific transformations). I will introduce the Complex Autoencoder, a model derived from a Gated Autoencoder, which learns both a transformation-invariant, and a transformation-variant feature space. Using transposition- and time-shift invariant features, we obtain improved performance for audio alignment tasks.