Audio Processing and Indexing - Bird Audio Detection Challenge

Goals

Bring into practice some of the techniques learned in previous assignments
Introduce yourself to working with large and varied audio datasets
Familiarizing yourself with machine learning toolboxes

Problem Introduction

"Detecting bird sounds in audio is an important task for automatic wildlife monitoring, as well as in citizen science and audio library management. The current generation of software tools require manual work from the user: to choose the algorithm, to set the settings, and to post-process the results. This is holding bioacoustics back in embracing its big data era: lets make this better!"

"The task is to design a system that, given a short audio recording, returns a binary decision for the presence/absence of bird sound (bird sound of any kind)." [1]

More information about the challenge and the motivation behind it can be found in the paper the challenge issuers wrote. [2]

Dataset

The datasets are available from the website of the challenge. However, for this assignment we only expect you to use the second dataset provided by Warblr[3]. The dataset contains 8000 ∼10-second mono audio files sampled at 44.1 kHz for a total of 4.6 GiB of WAV files. The .csv files containing the ground truths are available as separate downloads from the challenge website and are also included in the example below. The Warblr dataset contains about 75% positive examples and 25% negative examples.

A small subset (33MiB) of the dataset is available here, so you can start without having to wait on the full download.

No special distinction is made between data meant for training and data meant for testing. It is recommended to split the data yourself, for example 80% training 20% testing. In whatever way you split your data, you should never have overlap between testing and training data.

Note that there is a separate testing dataset available from the challenge website that we do not recommend you use as this set does not contain ground truths.

Positive example: this is a relatively typical example of bird sounds:

Positive example: (hard) this fragment contains a brief bird chirp and a lot of background noise:

Negative example: this sample only contains background sounds and noise:

Negative example: (hard) this sample contains a human mimicking a bird:

Techniques

You are free to use any machine learning method you want: you are encouraged not to implement a machine learning algorithm yourself but instead to use an off-the-shelf toolbox. A short list of common machine learning toolboxes for various platforms/languages is included at the end of this document. Feel free to use anything

Specifically you are also free to split your feature extraction and machine learning parts into different projects. For example, you can extract some features in Python using scipy/numpy and do machine learning in Weka.

We have no specific platform requirements, but your results should be reproducable in some way by us. This specifically means that your submission does not have to run on the computers at LIACS per se, and could even involve the installation of some toolboxes, virtual machines, or other software.

Task

Design some system that takes a training part of the Warblr dataset as input and attempts binary classification on the testing part of the dataset. This system does not have to be fully automated but the idea is that you use some kind of automated method, i.e., you can extract features in matlab and input them in some other software manually but the training should be automatic.

The orignal challenge states The output can be just "0" or "1", but we encourage weighted/probability outputs in the continuous range [0,1] for the purposes of evaluation. [1]. We recommend just doing a yes/no classification, but both approaches are allowed and valid.

Also please note that we do not expect any major results from these projects. Any result that is better than a weighted random classification should be considered as a good result for the purposes of this assignment. Even a poor accuracy can be considered good as long as there was a good attempt with some solid reasoning in the report.

In general we expect this assignment to take about as long as it took you to complete the previous workshops, so keep that in mind.

Report

You should write a short (1 or 2 pages) report detailing what you did and why you did it. Explain which feature(s) you used and why you used them. Detail your results. Did you use any windowing? How did you split your dataset? The report should also include instructions for reproducing your results. Also include in your report an estimation for how many hours you have worked on this project.

Example

To simplify your task, we have a short example for you, which is focussed on managing the dataset in MATLAB. You are free to use this, and any of the work you have done for the previous workshops, as your basis to read and process your audio files. The example also includes the netlab toolbox[4] for matlab which is also used in the example to input the feature vectors into a multi-layered perceptron (mlp) network.

In the example, the amount of zero-crossings is used as a one-dimensional vector per audio file. This is of course a very poor feature to classify birds, but it serves as a simple example.

Deadline

In order to make sure that this assignment does not interfere with your final project and to give you space in your planning we have decided to set the deadline to the same date as the final project and website. The date for which will be announced during class.

Short list of Machine Learning toolboxes

Weakly ordered in order of recommendation:

Caffe - http://caffe.berkeleyvision.org/
Weka - https://www.cs.waikato.ac.nz/ml/weka/
scikit-learn - http://scikit-learn.org/stable/index.html
Tensorflow - https://www.tensorflow.org/
SHOGUN - http://shogun-toolbox.org/
Keras - https://keras.io/
YAAFE - http://yaafe.sourceforge.net/
PRTools - http://prtools.org/
MATLAB Statistics and Machine Learning Toolbox - https://nl.mathworks.com/solutions/machine-learning/
Hidden Markov Tookit - http://htk.eng.cam.ac.uk/

References

Bird Audio Detection challenge: http://machine-listening.eecs.qmul.ac.uk/bird-audio-detection-challenge/
Dan Stowell and Mike Wood and Yannis Stylianou and Hervé Glotin: Bird detection in audio: a survey and a challenge (2016)
Warblr: the birdsong recognition app https://www.warblr.co.uk/
Netlab: Algorithms for Pattern Recognition - http://nl.mathworks.com/matlabcentral/fileexchange/2654-netlab