We are usually unaware of the enormous computing power needed by our brain when listening to music. When trying to make sense of music, we constantly have to classify, sort, remember, structure, and connect a vast number of musical events. Moreover, these events do not only consist of notes, chords, and rhythms but are also characterized by “colors of sound.” These ever-changing frequencies, resulting in complex soundscapes, are at the heart of our musical experiences. I use computer models to simulate the cognitive processes involved when listening to music, to create better tools for music production and music analysis. Creating compositions, musical arrangements, and unique sounds using machine learning and artificial intelligence will lead to a streamlined music production workflow and to entirely different ways to engage with music as a whole.
Learning features from data has shown to be more successful than using hand-crafted features for many machine learning tasks. Inmusicinformationretrieval(MIR), features learned from windowed spectrograms are highly variant to transformations like transposition or time-shift. Such variances are undesirable when they are irrelevant for the respective MIR task. We propose an architecture called Complex Autoencoder (CAE) which learns features invariant to orthogonal transformations. Mapping signals onto complex basis functions learned by the CAE results in a transformation-invariant “magnitude space” and a transformation-variant“phase space”. Thephasespaceis useful to infer transformations between data pairs. When exploiting the invariance-property of the magnitude space, we achieve state-of-the-art results in audio-to-score alignment and repeated section discovery for audio. A PyTorch implementation of the CAE,including the repeated section discovery method, is available online.