
- Science Park 3 - 4th - S3 402
- +43 732 2468 4700
- +43 732 2468 4705
- khaled.koutini(at)jku.at
My PhD project
Learning general-purpose audio representations with deep neural networks
Audio classification and tagging are central tasks in the field of machine listening. They are essential for machines to recognize the environment and identify events in their surroundings, and thus a critical component of machine perception. These tasks are relevant in a wide range of applications, including content-based multimedia
information retrieval, context-aware smart devices, and monitoring systems. One significant barrier to machine audio recognition is the high cost and scarcity of high-quality labeled data, particularly for powerful learners such as deep neural networks. We investigate the incorporation of inductive biases into neural network architectures and the training process in order to improve generalization when training on small audio datasets. We also investigate how to extract representations from models trained on large-scale, general-purpose datasets, to be transferred to specialized tasks.
Supervisor: Gerhard Widmer, JKU Linz, Austria
Dates: 01 Jan 2020 – Ongoing