Seminar Audio Processing and Indexing

(Will be updated. Last updated: 29-4 2021)




Period: Spring February 4th - April 29th 2021

Time:   Thursday 11.15 13.00

Place:  Kaltura:
( Password on Brightspace. Chrome-browser is recommended. )





Dr Erwin M. Bakker ( )

Room 145 and LIACS Media Lab (LML)


Teaching assistant:

to be announced


NB E-mail your name and student number to



During this seminar the fundamentals of audio processing and indexing will be studied. Applications in the area of speech recognition, audio synthesis and content based audio retrieval will be discussed. State of the art work on content based audio retrieval will be studied and presented by the participants.

The seminar starts with several lectures and accompanying assignments in the form of workshops; followed by a literature selection, study, and presentations by all the students; the seminar ends with final project demos / presentations.

Requirements: C, C++

Grading (6 ECTS): Presentations and Project (60% of grade). Class discussions, attendance, and workshops (40% of grade). It is necessary to be at every class and to complete every workshop. If you can not be there, you must contact Dr. E.M. Bakker before class!



Lecture slides and further materials will be made available on this site.


List of recommended books:


Discrete-Time Speech Signal Processing, Principles and Practice by T.F. Quatieri, Prentice Hall PTR; ISBN 013242942, 2002.


Fundamentals of Speech Recognition by Lawrence Rabiner, and Biing-Hwang Juang (Hardcover, 507 pages; Publisher: Pearson Education POD; ISBN: 0130151572; 1st edition, April 12, 1993)


Spoken Language Processing: A Guide to Theory, Algorithm and System Development by Xuedong Huang , Alex Acero , Hsiao-Wuen Hon , Raj Reddy (Hardcover, 980 pages; Publisher: Prentice Hall PTR; ISBN: 0130226165; 1st edition, April 25, 2001) 


Speech Recognition: Theory and C++ Implementation by Claudio Bechetti and Lucio Prina Ricotti (Hardcover, 407 pages; Publisher: John Wiley & Sons; ISBN: 0471977306; 1st edition April, 1999)




Schedule (tentative, visit regularly):
 Organization and Introduction.
11-2  Audio Production and Processing.
 ADC and an Algebraic Introduction to FT
25-2  FFT
4-3  Project Proposals (presentations by students)
 Audio Features and Data Sets
18-3  Audio Features workshop and data
 Machine Learning + Workshop
 Student Paper Presentations I. Schedule.
8-4  Student Paper Presentations II.
15-4  Student Paper Presentations III.
22-4  Project Progress Reports
29-4  Team Meetings
 Final Project Presentations Demo
  27-5  Final Technical Project Paper (4-8 pages), code, and  Web Site


Assignments (workshops):

  1. Vocal Tract Workshop. Due: 18-2 2021 before 11.15.
  2. FFT Workshop and audio_data. Due 10-3 2021 before 23.59.
  3. Audio Features Workshop and data. Due 24-3 2021, 23.59.
  4. Machine Learning Workshop. Due 5-4 2012, 23.59
Project Links

Student Paper-Presentations Session I

Session II

  • one

Session III

  • two

Previous Project Titles and Pages

  • Second Voice Generation
  • Robustness of Musical Genre Identification
  • Improved Mobile Song Recognition
  • An iOS App using Bliss for Improved Communication through Text-To-Speech
  • ScoreAid
  • Emotion Recognition
  • Instrument Detection
  • Musical Instrument Recognizer ( Annotation  )
  • Audio Feature Extraction with Deep Belief Networks
  • Audio Morphing
  • Audio Indexing the 1.000.000 song data set
  • Chord Recognition
  • Audio Phantom Materialization
  • Harmonic Model Based Audio Transformations
  • Content-Based Music Similarity, Visualization and Automatic Play-List Generation.
  • Indexing and Predicting Bands from Unknown Songs
  • Interpolation between Different Instruments
  • Modular Synthesizer
  • Hit Predictor
  • Pitch Perfector
  • Inter-Voice Morphing


During the seminar we will study state-of-the-art audio indexing methods and techniques using recent scientific publications from international journals, workshops, and conferences on content based audio retrieval.

Each student will present a recent technical paper:

  • Each student will have to select 1 very recent scientific publications on audio processing and/or indexing.

  • Each student will study the selected paper into great detail, and present it during a 12 minutes talk followed by 3 minutes of questions for a critical audience.

  • In order to ensure a critical audience, every student in class is expected to have read at least the title, abstract and conclusions of the papers to be presented. Also every student is expected to have prepared at least one question per paper.


During the seminar each student has to do a project related to audio processing/synthesis/indexing.

The agenda for the projects is as follows:

  1. Project proposal presentation using the website for the project. It has to address:

    • Title of the proposal.

    • Reference(s) of the paper(s).

    • A short description of the problem(s) to be solved.

    • The state of the art with respect to these problems.

    • A (realistic) goal of the proposed project.

  2. Project status reports.

  3. Final project presentation / demo.

  4. Final technical project paper.

Project Web Pages

Every student has to maintain a project web page on which progress, documents, code, links, etc. related to the project are maintained. Here you can find an example project page. Feel free to design your own project web page though. Do not forget to mail me the link to your project page.

Note: Using your university account you can put your web-page under a directory \home\public_html.


You have electronic access to all of the listed journals by using your ULCN-account (for further details see ):