Lab 4  Pitch Detection
Summary
In this lab, you will learn how to detect the pitch of a signal in real time via autocorrelation. An example of the final solution can be found here. Next lab will utilize this pitch detector in order to do pitch synthesis a la AutoTune.
Downloads
Python
As in Lab 3, your task for the Python portion of this lab will be to prototype your Android system. For every frame of audio data, you will determine whether the frame contains background noise or a voiced signal. You will then compute the autocorrelation using a fast frequencybased method. Finally, you will determine the pitch from the autocorrelation output. These blocks are described in further detail below.
Note
For the sake of completeness, the notes from the prelab describing voiced/unvoiced detection and autocorrelation are reproduced below.
Part 1  Voiced/Unvoiced Detector
Voiced/unvoiced signal classification is an incredibly wellstudied field with a number of vetted solutions such as Rabiner's pattern recognition approach or Bachu's zerocrossing rate approach. Pitch shifting (next lab) does not require highlyaccurate voiced/unvoiced detection however, so we will use a much simpler technique.
The energy of a signal can be a useful surrogate for voiced/unvoiced classification. Put simply, if a signal has enough energy, we assume it is voiced and continue our pitch analysis. The energy of a discretetime signal is given as follows:
In this block, you will have to determine a useful threshold for and classify frames as voiced or unvoiced depending on .
Part 2  Autocorrelation
Autocorrelation is the process of circularly convolving a signal with itself. That is, for a real signal, the discrete autocorrelation is given as:
where is the complex conjugate of the time reversal of . The output measures how selfsimilar a signal is if shifted by some lag . If normalized to 1 at zero lag, this can be written equivalently as:
For a periodic signal, the lag that maximizes indicates the frequency of the signal. In other words, the signal takes samples before repeating itself. This algorithm, combined with some additional modifications to prevent harmonics from being detected, comprises the most wellknown frequency estimator for speech and music.
Big O Notation
Unfortunately, computing in the time domain is an extremely slow operation. The complexity in Big O Notation is , meaning that if a signal has samples, computing the autocorrelation will require operations. The realworld ramifications of this is that computing the autocorrelation will be too slow to fit in our timing window.
Fortunately, there is a way around this. Recall that convolution in time is the same as multiplication in frequency. Convolution is generally an operation, but computing the FFT of a signal is only . Converting to the frequency domain then, we have the following equation:
or in pseudocode, autoc = ifft(fft(x) * conj(fft(x))
. This is an enormously powerful result. Computing the FFT or inverse FFT is only , and elementwise multiplication only . We can reuse the result of one of the FFTs to only require one FFT, one inverse FFT, and one full multiplication. Our total computational complexity is then . Big O notation typically drops all but the dominant term, so our final cost is considered .
Note
For visualization, the table below demonstrates the growth in of the two Big O costs.
number of operations  number of operations  Ratio of to  

10  100  10  0.1 
100  10000  200  0.02 
1000  1000000  3000  0.003 
10000  100000000  40000  0.0004 
100000  10000000000  500000  0.00005 
1000000  1000000000000  6000000  0.000006 
For our frame length of , we get a speedup of around 600x (approximately) using the frequency domain method.
Part 3  System Requirements
The Python test code has been given to mimic the final Android system. You will get an input frame of size 2048, corresponding to about 40 ms of data, and you will determine the pitch (if it is voiced). If the pitch is unvoiced, you should return 1. Think about the following questions.
Question
 The autocorrelation for speech signals will be periodic with many candidate peaks. How do you decide which peak to use?
 The autocorrelation for any signal will be maximal in the neighborhood surrounding zero lag. How do you decide what to ignore?
 Why did we choose 40 ms frames?
Assignment 1
Implement the Python system described above, and plot the output frequency over time. Also compute the output frequency for N
equals 512, 1024, 2048, 4096, and 8192. What is the resolution and detectable frequency range (minimum and maximum pitch) for a given N
? These plots and discussion on resolution/range will count for 2 demo points.
Do not rely on builtin functions such as np.correlate()
, and don't worry about frame boundaries.
Android
Part 4  Pitch detection in Android
If you've implemented your Python system without relying on many builtins, it should translate nicely to Android. Your code will reside in cpp/ece420_main.cpp:ece420ProcessFrame()
as before. The library ece420_lib.cpp
contains some functions that you may find useful. Namely, findMaxArrayIdx()
will return the index of the maximum value in a float
array, given some minimum index (inclusive) and some maximum index (exclusive). For example:
float arr[10] = {8, 2, 3, 4, 5, 4, 3, 2, 1, 10}; // Finds the index of the maximum value only considering indices 3, 4, 5, 6 // Remember, C++ and Python are both zeroindexed int maxIdx = findMaxArrayIdx(arr, 3, 7); // maxIdx == 5
Assignment 2
Implement the block diagram described in the Python section in cpp/ece420_main.cpp:ece420ProcessFrame()
. At the end of each processing block, write your detected frequency to lastFreqDetected
. If the buffer was unvoiced, write 1
to lastFreqDetected
.
The Android implementation will count for two demo points. The following will be considered when grading:
 Is the voiced/unvoiced detection reasonable?
 Are you accurately detecting the fundamental frequency and not the harmonics?
Submission Instruction
Refer to the Submission Instruction page for more information.
Grading
Lab 4 will be graded as follows:

Prelab [2 points]

Lab [4 points]
 Python:
 Assignment 1 [2 points]
Voiced/Unvoiced detection [0.5 point]
Autocorrelation implementation [1 point]
Short answer question [0.5 point]
 Assignment 1 [2 points]
 Android:
 Assignment 2 [2 points]
Voiced/Unvoiced detection [0.5 point]
Autocorrelation implementation [1 point]
Filter functionality and app demo [0.5 point]
 Assignment 2 [2 points]
 Python:

Quiz [2 points]