Théis Bazin

Date

16/12/2020 5:00 pm

Sony Computer Science Laboratories Paris

Driven by a strong interest in experimental music as well as electronic dance music, I have attended over the last years many live music performances, looking for stimulating new ways of performing music. Often, I find computers interactions on stage unsatisfactory. It feels like the musician loses freedom once the computer gets into the loop, leading to very formulaic and constrained, pre-structured music. This is where I think generative probabilistic model can play an exciting role, by enabling the computer to be not just a static tool, producing predictable and repeated outputs for the same inputs, but rather become a creative force on stage, presenting the musician with ever-changing propositions. In this context, the musician can be, again, more than just an operator and indeed be a creator. Yet, a generative model in a creative context is only as powerful as the interface through which its user interacts with it. This calls for a constant back-and-forth between the design of powerful deep learning models and the design of the associated interfaces, in order to build meaningful – and useful – new systems.

Sound synthesis, the generation of sound through analog electrical current or digital software, is a vibrant field since the second half of the 20th century. New synthesizers have expanded the scope of sounds that can be generated to previously unexplored reaches. But with this expansion also comes complexity: modern synthesizers, with sometimes over 100 continuous-valued parameters, can be daunting to program! Our work is an attempt at bridging the gap in sound synthesis between flexibility and ease of use. Thanks to recent advances in neural networks-based techniques, we provide users with the ability to edit and transform sounds through simple operations inspired by image processing software à-la-Paint. Namely, we frame the sound synthesis process as an interactive inpainting task, where portions of a sound are selectively transformed by the user. At each of these steps, our models are tasked with proposing new sonic content for the selected zones, by analyzing their surrounding context. In this talk, I will present both a novel machine learning architecture aimed at performing inpainting on spectrograms developed with Gaëtan Hadjeres (Sony CSL Paris), as well as a new, open-source interactive web interface that allows musicians to readily make use of these new models in creative settings: NOTONO.

&inCSL

Spectrogram Inpainting for Interactive Generation of Instrument Sounds