Audio Processing and Indexing Project Page


(Last updated: 19 jan 2018)


Project Members

Giorgos Kyziridis

Geerten Verweij


The goal was to create a way to transform singing to a MIDI signal and to be able to have an instrumental representation of what was sung. We had several steps in mind for the implementation. Firstly the pitch needs to be extracted from an input signal. Then this has to be transformed to a MIDI signal. Then that signal needs to be transferred to some software or a device that can play MIDI.


MIDIoke is made to transform human singing to a MIDI signal. It uses fast Fourier transformation and autocorrelation to get the fundamental frequency from the input. It outputs a MIDI file that can be used in other software. MIDIoke's limitations are; the constant note length, the lack of directly streaming MIDI and the lack of consonant filtering.


There are many software plugins that offer the option of transforming sound to a MIDI signal. However these are never really as good as a user would want them to be. The idea of MIDIoke is to try and improve on these existing solutions. We focus specifically on input from a singing user and on the possibility of having live MIDI output. That is why the word karaoke is containted in the name MIDIoke.


Step one is capturing the input from the user.

Step two is extracting pitch information from that signal using fft with autocorrelation.

Step three is outputting this pitch information as MIDI, either streaming directly to a MIDI consumer or to a MIDI-file.


We did not yet manage to implement everything we wanted but we did get the pitch detection right. Our current version that uses fft with autocorrelation is capable of finding the correct fundamental frequency of the sung input. However we still struggle a bit with making nice output. We have not yet set up a direct streaming method to link the MIDI output directly to a MIDI consumer. The MIDI signal is instead written to a file. Only MIDI notes that are louder than a certain threshold get written to this file to exclude background noise. The current version of MIDIoke has a set length for each note, when a singer produces a longer note this will simply become a sequence of the same note with short durations. It would be much more desirable to have the length of the output notes match the length of the input notes.


Our experiment with MIDIoke was quite simple. We sang a basic melody, put the output file into a MIDI sequencer and then checked if the result matched our intended melody. The pitch of the produced MIDI file did match the melody that was sung. Occasionally some extra notes are added that do not fall within the melody which are probably created by consonants or background noises.



Software Requirements

To run MIDIoke you only need Python 3.5 and some libraries.

  • Python 3.5. We choose this platform because it is easy to implement a prototype. If we ever run into performance issues we might have to look at low level languages like C and C++.

  • pyaudio. This package is used to deal with direct audio input to our application.

  • scipy.signal. From this package we use the fftconvolve method. This gives us fft frequency information but then convoluted with the reverse of the input signal which is the autocorrelation step.

  • We also use basic packages like pyplot and numpy.

  • The core of MIDIoke was based on these examples we found: LINK

Hardware Requirements

  • Any modern personal computer should be able to run MIDIoke.

  • Another hardware requirement is that you have audio input on your machine.


Project Links