Stefan Lattner PhD

Stefan Lattner PhD

Sony Computer Science Laboratories Paris

We are usually unaware of the enormous computing power needed by our brain when listening to music. When trying to make sense of music, we constantly have to classify, sort, remember, structure, and connect a vast number of musical events. Moreover, these events do not only consist of notes, chords, and rhythms but are also characterized by “colors of sound.” These ever-changing frequencies, resulting in complex soundscapes, are at the heart of our musical experiences. I use computer models to simulate the cognitive processes involved when listening to music, to create better tools for music production and music analysis. Creating compositions, musical arrangements, and unique sounds using machine learning and artificial intelligence will lead to a streamlined music production workflow and to entirely different ways to engage with music as a whole.

Learning Transformations and Invariances with Artificial Neural Networks

In this talk, I will explain the mechanisms of transformation- and invariance learning for symbolic music and audio, and I will describe different models that are based on this principle.Transformation Learning (TL) provides us with a novel way of musical representation learning. To that end, we do not aim to learn the musical patterns themselves, but some “rules” defining how a given pattern can be transformed into another pattern. TL was initially proposed for image processing and had not yet been applied to music. In this talk, I summarize our experiments in TL for music. The models used throughout our work are based on Gated Autoencoders (GAE) which learn orthogonal transformations between data pairs. We show that a GAE can learn chromatic transposition, tempo-change, and the retrograde movement in music, but also more complex musical transformations, like diatonic transposition. Transformation Learning (TL) provides us with a different view on music data, and yields features complementary to other music descriptors (e.g., such as obtained by autoencoder learning or hand-crafted features). There are different possible research directions regarding TL in music. They involve using the transformation features themselves, using transformation-invariant features computed from TL models, and using TL models for music generation. I will particularly focus on DrumNet, a convolutional variant of a Gated Autoencoder, and will show how TL leads to time and tempo-invariant representations of rhythm. Importantly, learning transformations and learning invariances are two sides of the same coin (as specific invariances are defined with respect to specific transformations). I will introduce the Complex Autoencoder, a model derived from a Gated Autoencoder, which learns both a transformation-invariant, and a transformation-variant feature space. Using transposition- and time-shift invariant features, we obtain improved performance for audio alignment tasks.