Stefan Lattner PhD

Stefan Lattner PhD

Sony Computer Science Laboratories Paris

We are usually unaware of the enormous computing power needed by our brain when listening to music. When trying to make sense of music, we constantly have to classify, sort, remember, structure, and connect a vast number of musical events. Moreover, these events do not only consist of notes, chords, and rhythms but are also characterized by “colors of sound.” These ever-changing frequencies, resulting in complex soundscapes, are at the heart of our musical experiences. I use computer models to simulate the cognitive processes involved when listening to music, to create better tools for music production and music analysis. Creating compositions, musical arrangements, and unique sounds using machine learning and artificial intelligence will lead to a streamlined music production workflow and to entirely different ways to engage with music as a whole.

Neural Audio Synthesis and Restoration with Generative Adversarial Networks

Considering the impressive success of Generative Adversarial Networks (GANs) in image generation throughout the last years, it is only natural to apply these models to audio generation and restoration, too. Therefore, in the last three years, we performed several experiments regarding audio synthesis with GANs, involving drum sample generation, tonal synthesis, and MP3 restoration. In DrumGAN, we examined how neural synthesis can improve the artistic processes in music production using perceptual features for intuitive user control. In DarkGAN we exploit the principle of dark knowledge in neural networks and distill the knowledge of an audio classifier into a generative GAN architecture. In VQ-CPC GAN we tackle the problem of generating variable-length audio content with GANs, resulting in a non-autoregressive sequence generation. Finally, we studied the problem of audio transformation and restoration with GANs by restoring MP3-compressed popular music to its high-quality version.In my talk, I will give an overview of the underlying principles of our work, and I will show some audio examples and a live demo of the DrumGAN prototype.