ECE398BD: Audio and Visual Analytics (Labs)

Schedule

Topic	Lab	Lab Preview	Quiz	Assigned	Due
Intro to Audio and Visual Analytics	download	Lab 9	none	March 30	April 6 @ 2359
Clustering with Image and Audio	download	Mel Frequency Ceptrum Image Segmentation		April 6	April 13 @ 2359 Extended April 15
Clustering with Image and Audio - Shazam	download	Shazam Audio Search		April 13	April 20 @ 2359 Extra Credit April 30
Clustering with Image and Audio - Images	download	SIFT		April 20	April 27 @2359
Neural Networks	download	Intro to Theano CNN MNIST		April 27	Star Wars Day @2359

Hints, Errata and Feedback

If any changes, hints or commentary are needed for the labs, they will be provided here.

Neural Networks Lab

Documentation for Theano and Keras can be found here:
Theano
Keras

The lab for Theano is for understanding and will not be graded, however, you might find it useful when trying more advanced stuff in the Keras lab. Please just submit the Keras lab.

SIFT Lab

This is why we have to go through so much trouble for this lab :/ Where did SIFT and SURF go in OpenCV

OpenCV Source Code: Main OpenCV Project: http://github.com/Itseez/opencv Extras and non-free stuff: http://github.com/Itseez/opencv_contrib

Basic Git instructions: (You should do this for both opencv and opencv_contrib)

1. Clone the repository to your computer

‘git clone <repository>‘

2. Checkout required version

‘git checkout 3.1.0‘

Building Instructions Create a build directory and cd into it. Next run ‘cmake -DOPENCV_EXTRA_MODULES_PATH=<opencv_contrib>/modules <opencv_source_directory>‘ from within the build directory. Check that python support is enabled. If no error occurs, type ‘make‘ and go grab a cup of coffee. After which, type ‘sudo make install‘ if running on a computer you own or simple copy ‘cv2.so‘ from ‘lib‘ to where your notebook is. You should now be able to ‘import cv2‘ from within your notebook.

I suggest that everyone give building the entire project a try, but I will not enforce it. I'll gladly assist anyone who needs more help in figuring this out. This would benefit you greatly for research if that's what you are interested in.

README.md from the opencv_contrib project contains more detailed instructions.

Alternative Instructions If you choose not to compile OpenCV from source, the lab would also work with older version of OpenCV, before they removed SIFT from its core installation. If you are on Ubuntu, the system package should have 2.4.x However, you might have to change a few lines of code in the lab. I've included some instructions on where some changes might be in the lab itself, but there might be some that I missed, since I wrote in those purely from my memory of the old API. If you do so, make sure to let me know the exact version when you submit the lab. If you are using older versions, documentation for those version are also avaliable at http://docs.opencv.org/, just select the version that you are using. You will also notice that very often, the documentation refers to C/C++ API, make sure that what you are reading is meant for Python.

Mac users, if you have homebrew installed and are using it for everything python, the most painless way to do this is ‘brew install opencv3 –with-contrib‘.

MFCC lab

hints:
Using getWindow from the first lab might save you some time.
When plotting the spectrogram, we are only interested in the amplitude of the signal.

Suggestions: For image segmentation, one nice way of visualizing is to replace the pixel values of with that of the cluster center.

corrections: There's a mistake in the email on the calcMFCC() For those who followed the code in the lab itself, you will be fine. calcMFCC takes in 3 parameters, the signal, sampling rate and number of filterbanks. The output will be a 2D array, each row is the MFC coefficients of a single window of the signal passed in. The number of rows is equal to the number of windows that you have. Window size and step size are not passed in. You could set them using values that works well, or use values that I've used in the example code.

For extra credit, some gobins ate up certain instructions. It should read as follows:
Propose and implement something that will tell you if the speaker doesn't exists in the database. You may want to record your own voice for this part.

References

Shazam Paper

A tutorial on MFCC