Sony CSL

Kin Wai

nnAudio: a GPU audio processing tool and its application to music transcription

I will present a recently released neural-network-based audio processing toolbox called nnAudio. This toolbox leverages 1D convolutional neural networks for real-time spectrogram generation (time-domain to frequency-domain conversion). This enables us to generate spectrograms on-the-fly without the need to store any of the spectrograms on the disk when training neural networks for audio related tasks. In this talk, I will discuss one of the possible applications of nnAudio, namely, the exploration of suitable input representations for automatic music transcription (AMT).