Semivowels are [+sonorant,+continuant] sounds, like vowels. Prosodically (long-term), they act like consonants, but segmentally (short-term), they act like vowels. There are as many different words to describe this manner class (4: semivowel, approximant, glide, liquid) as there are phonemes in this class (4: w,j,É»,l), so let's walk through this maze of terminology.
Here's a figure to describe it:
import urllib.request as request
import io
import soundfile as sf
import numpy as np
import scipy.fftpack as fft
import matplotlib.pyplot as plt
%matplotlib inline
import spectrogram
import math
consonant_pathnames = {
'w' : 'f/f2/Voiced_labio-velar_approximant',
'j' : 'e/e8/Palatal_approximant',
'É»' : '3/33/Postalveolar_approximant',
'l' : 'b/bc/Alveolar_lateral_approximant',
'Ê•' : 'c/cd/Voiced_pharyngeal_fricative',
'É°':'5/5c/Voiced_velar_approximant'
}
consonant_waves = {}
for c_ipa,c_pathname in consonant_pathnames.items():
c_url = 'https://upload.wikimedia.org/wikipedia/commons/{}.ogg'.format(c_pathname)
try:
req = request.urlopen(c_url)
except request.HTTPError:
print('Unable to download {}'.format(c_url))
else:
c_wav,c_fs = sf.read(io.BytesIO(req.read()))
c_filename = c_ipa + '.wav'
sf.write(c_filename,c_wav,c_fs)
consonant_waves[c_ipa] = c_wav
print('Donwnloaded these phones: {}'.format(consonant_waves.keys()))
consonant = 'É°'
(S,Ext)=sg.sgram(consonant_waves[consonant], int(0.001*c_fs), int(0.006*c_fs), 1024, c_fs, 4000)
plt.figure(figsize=(17,10))
plt.subplot(211)
plt.plot(consonant_waves[consonant])
plt.title('Waveform of /{}/'.format(consonant))
plt.subplot(212)
plt.imshow(S,origin='lower',extent=Ext,aspect='auto')
plt.title('Spectrogram of /{}/'.format(consonant))
That spectrogram has a huge voicebar. We had the same problem last time. I finally figured out what I was missing, that should get rid of the overly huge voicebar: pre-emphasis.
Pre-emphasis is a simple one-tap high-pass filter, a.k.a. a differentiator, or more correctly, a differencer. Those are fancy words meaning that we compute the difference between each sample and the one before it: $$y[n] = x[n] - \alpha x[n-1]$$ where $\alpha$, the pre-emphasis factor, can be set to something like 0.95, or 0.97, or maybe even 0.99. Even 1.0 might work just fine. Tweak it until the spectrogram looks right to you.
Since last week, I've added the following pre-emphasis function to the newly downloadable file spectrogram.py, so you can get it if you download https://courses.engr.illinois.edu/ece590sip/sp2018/spectrogram.py.
def preemphasize(x,alpha):
y = np.zeros(len(x))
y[0] = x[0]
for n in range(1,len(x)):
y[n] = x[n] - alpha*x[n-1]
return(y)
def sgram(x,frame_skip,frame_length,fft_length, fs, max_freq, alpha=0):
'''USAGE:
import sgram
(x_sgram,x_extent) = sgram.sgram(x, frame_skip, frame_length, fft_length, fs, max_freq)
plt.figure(figsize=(14,4))
plt.imshow(x_sgram,origin='lower',extent=x_extent,aspect='auto')
'''
if len(x.shape) > 1: # If x is stereo, sum over the channels
x = np.sum(x,axis=1)
y = preemphasize(x,alpha)
frames = enframe(y,frame_skip,frame_length)
(spectra, freq_axis) = stft(frames, fft_length, fs)
sgram = stft2level(spectra, int(max_freq*fft_length/fs))
max_time = len(frames)*frame_skip/fs
return(np.transpose(np.array(sgram)), (0,max_time,0,max_freq))
import spectrogram as sg
consonant = 'É°'
(S,Ext)=sg.sgram(consonant_waves[consonant], int(0.001*c_fs), int(0.006*c_fs), 1024, c_fs, 4000, 0.99)
plt.figure(figsize=(17,10))
plt.subplot(211)
plt.plot(consonant_waves[consonant])
plt.title('Waveform of /{}/'.format(consonant))
plt.subplot(212)
plt.imshow(S,origin='lower',extent=Ext,aspect='auto')
plt.title('Spectrogram of /{}/'.format(consonant))
Let's try spectrogram reading from an interesting source with data in lots of languages: the Australian special broadcasting service, https://www.sbs.com.au/radio/yourlanguage
I have never found any other service making podcasts available in so many languages. Youtube does, but youtube specifically forbids direct download of their files, which means youtube is useless to science. SBS tries to encourage you to listen on their site, but they have not yet completely forbidden direct download; I hope they won't.
There is no simple button to download a file, but if you click on a file to play it, then a complex HTTP Get request will show up at the top. In that Get request, there is a field called fileuri. If you cut and paste that fileuri into another window, you will still get a streaming interface, but now there is a download button available. As of the time of this writing, you can download a sample file from http://media.sbs.com.au/ondemand/audio/Tuesday_ONDemand_ARABIC24_06_00.mp3
The file is still in mp3. I haven't been able to find a python package I like that reads MP3 audio, so I used the following steps:
filename = 'Thursday_ONDemand_ARABIC24_06_00.wav'
x_wav_stereo,x_fs = sf.read(filename,1323000) # Read only the first 10 seconds (30*44100=1323000)
x_wav_mono = np.sum(x_wav_stereo,axis=1)
print('Read audio from {}, {} samples at {} samples/second'.format(filename,len(x_wav_mono),x_fs))
(x_sgram,x_extent)=sg.sgram(x_wav_mono, int(0.006*x_fs), int(0.006*x_fs), 1024, x_fs, 4000, 0.99)
plt.figure(figsize=(17,10))
plt.subplot(211)
plt.plot(x_wav_mono)
plt.subplot(212)
plt.imshow(x_sgram,origin='lower',extent=x_extent,aspect='auto')
Looks like there's silence from 0 to 2 seconds, then music from about 2 to 5 seconds, then speech from about 5 to 17 seconds, then music again from 17 to 24 seconds, then speech from 24 seconds onward.
x_speech = x_wav_mono[int(25*x_fs):int(27*x_fs)]
sf.write('Thursday_speech.wav',x_speech,x_fs)
(x_sgram,x_extent)=sg.sgram(x_speech, int(0.001*x_fs), int(0.006*x_fs), 1024, x_fs, 4000, 0.97)
plt.figure(figsize=(17,10))
plt.subplot(211)
plt.plot(x_speech)
plt.subplot(212)
plt.imshow(x_sgram,origin='lower',extent=x_extent,aspect='auto')
We don't really have enough knowledge, yet, to try to read this from scratch. But if you listen to the audio, you should be able to find the semivowels.