Seminar Audio Processing and Indexing

( Will be updated. Last updated: October 8th 2024. )

Contents

Period
Time
Place

Organizers
Abstract
Requirements
Materials
Schedule
Previous Projects

Links

Period: Fall September 10^th - December 17^th 2023

Time: Tuesday 13.15 - 15.00

Place: online using Kaltura, or BW 0.39 (Gorlaeus Building) Please see announcements.

Organizers:

Lecturer:

Dr Erwin M. Bakker ( erwin@liacs.nl )

LIACS Media Lab (LML)

Teaching assistant:

to be announced

NB Register on BrightSpace.

Abstract:

During this seminar the fundamentals of audio processing and indexing will be studied. Applications in the area of speech recognition, audio synthesis and content based audio retrieval will be discussed. State of the art work on content based audio retrieval will be studied and presented by the participants.

The seminar starts with several lectures and accompanying assignments in the form of workshops; followed by a literature selection, study, and presentations by all the students; the seminar ends with final project demos / presentations.

Requirements: C, C++

Grading (6 ECTS): Presentations and Project (60% of grade). Class discussions, attendance, and workshops (40% of grade). It is necessary to be at every class and to complete every workshop. If you can not be there, you must contact Dr. E.M. Bakker before class!

Materials:

Lecture slides and further materials will be made available on this site.

List of recommended books:

Discrete-Time Speech Signal Processing, Principles and Practice by T.F. Quatieri, Prentice Hall PTR; ISBN 013242942, 2002.

Fundamentals of Speech Recognition by Lawrence Rabiner, and Biing-Hwang Juang (Hardcover, 507 pages; Publisher: Pearson Education POD; ISBN: 0130151572; 1st edition, April 12, 1993)

Spoken Language Processing: A Guide to Theory, Algorithm and System Development by Xuedong Huang , Alex Acero , Hsiao-Wuen Hon , Raj Reddy (Hardcover, 980 pages; Publisher: Prentice Hall PTR; ISBN: 0130226165; 1st edition, April 25, 2001)

Speech Recognition: Theory and C++ Implementation by Claudio Bechetti and Lucio Prina Ricotti (Hardcover, 407 pages; Publisher: John Wiley & Sons; ISBN: 0471977306; 1st edition April, 1999)

Links

Journals

You have access to all of the listed journals by using your ULCN-
account (for further details see http://www.bibliotheek.leidenuniv.nl/ ):

Schedule (tentative, visit regularly):

10-9	Organization and Introduction (online)
17-9	Audio Production and Processing
24-9	ADC and an Algebraic Introduction to FT
1-10	FFT
8-10	Audio Features
15-10	Project Proposals I (in class presentations by students)
22-10	Project Proposals II & Student paper selection
29-10	Machine Learning
5-11	Student Paper Presentations I
12-11	Student Paper Presentations II
19-11	Student Paper Presentations III
26-11	Student Paper Presentations IV
3-12	No class.
10-12	Final Project Presentations Demo's
17-12	Project Deliverables: - Final Project - Scientific/technical paper (4-8 pages) - Code - Web Site (or github)

Assignments (Workshops@Home):

Vocal Tract Workshop.

FFT Workshop and audio_data.

Audio Features Workshop and data.

Machine Learning Workshop.

Project Links Fall 2023

TBA

Student Paper-Presentations Session (examples from previous years)

Guitar Tablature Estimation with a Convolutional Neural Network.
Multimodal Classification of Emotions in Latin Music.
An Early Study on Intelligent Analysis of Speech under COVID-19: Severity, Sleep Quality, Fatigue, and Anxiety.
Speech Emotion Recognition with deep learning.
VocGAN: A High-Fidelity Real-Time Vocoder with a Hierarchically-Nested Adversarial Network.
Ddsp: Differentiable digital signal processing.
Speaker discrimination in humans and machines: Effects of speaking style variability.

Previous Project Titles and Pages
2022

2021

Audio Source Separation
AudioVibe
Heart Sound Classification with Deep Learning
Guitar Tabular Estimation
Rain Forest Connection Species Audio Detection
Voice Style Transfer ( github )
On automated clique detection and personalized music generation
Speech Emotion Recognition
Wave Generation for Electronic Music
Audio Translation and Style Transfer
Automatic Valence and Arousal Recognition in Music (code)

Earlier - 2020

Second Voice Generation

Robustness of Musical Genre Identification

Improved Mobile Song Recognition

An iOS App using Bliss for Improved Communication through Text-To-Speech

ScoreAid

Emotion Recognition

Instrument Detection

Musical Instrument Recognizer ( Annotation )

Audio Feature Extraction with Deep Belief Networks

Audio Morphing

Audio Indexing the 1.000.000 song data set

Chord Recognition

Audio Phantom Materialization

Harmonic Model Based Audio Transformations

Content-Based Music Similarity, Visualization and Automatic Play-List Generation.

Indexing and Predicting Bands from Unknown Songs

Interpolation between Different Instruments

Modular Synthesizer

Hit Predictor

Pitch Perfector

Inter-Voice Morphing

Presentations

During the seminar we will study state-of-the-art audio indexing methods and techniques using recent scientific publications from international journals, workshops, and conferences on content based audio retrieval.

Each student will present a recent technical paper:

Each student will have to select 1 very recent scientific publications on an audio related subject.
Each student will study the selected paper into great detail, and present it during a 12 minutes talk followed by 3 minutes of questions for a critical audience.
In order to ensure a critical audience, every student in class is expected to have read at least the title, abstract and conclusions of the papers to be presented. Also every student is expected to have prepared at least one question per paper.

Projects

During the seminar each student has to do a project related to audio processing/synthesis/indexing.

The agenda for the projects is as follows:

Project proposal presentation (4 slides):
- Title of the proposal+group members (1-4 members).
- Problem description.
- Challenges.
- What will be the goal for the Final Project Presentation/Demo.
- Note: If the group consists of more than 1 member, add a 5th slide with an initial global division of the work between project members. This slide does not have to be presented.
Project status reports.
Final project presentation / demo.
Final technical project paper.

Project Web Pages

Every student has to maintain a project web page or github on which progress, documents, code, links, etc. related to the project are maintained. Here you can find an example project page. Feel free to design your own project web page though. Do not forget to mail me the link to your project page.

Note: Using your university account you can put your web-page under a directory \home\public_html.