My Publications

Master Thesis

Wavelet Filter Bank With Scaling Factor Larger Than Two (Portuguese)
abstract, pdf, presentation

The traditional wavelet analysis is equivalent to a filter bank, formed by a low-pass and a high-pass filter, in which the frequency resolution changes by one octave between two subsequent stages. In some applications, mainly in the one that inspired this work, i.e. modeling of the peripheral human auditory system, the frequency resolution of one octave is fairly poor (in human audition the frequency resolution is approximately of 1/3 of an octave) to attain a good representation of auditory phenomena. In the present work, it was chosen to particionate the subspaces using a scale factor greater then two and more than one wavelet, so that it is possible to achieve a better frequency resolution for certain bands. This change in the scale factor leads to the need to design new wavelets suited to the scale factor in use. In order to achieve this objective, a generalization of the wavelet construction proposed by Ingrid Daubechies, adapted to the situation in which the scale factor is not two anymore, is carried out. The results achieved show how to design the scale function and determine a property that the filter bank coefficients must satisfy in order to obtain a perfect reconstruction analysis filter bank. It is shown analytically and numerically that certain results achieved are indeed solutions. Nevertheless, it is still necessary to determine how to design the wavelet functions satisfying the imposed restriction and using the scale function desired.


Articles

A System for Multimodal Speech Reproduction (English)
abstract, pdf

It is described a program created for the production of realistic animations of tridimensional facial models in real time, based on LPC analysis of speech and previous determination of the characteristic dynamics of the speaker model. The final program was successfully able to render a model with 900 triangles in a frame with 350x400 pixels at 60 frames per second, driven only by parameters extracted from a speech signal.

A Brief History of Auditory Models (English)
abstract, pdf

This work presents a brief description of the human auditory system together with the history of human comprehension of the auditory function, its main features, and classic models used to represent it. First, a historical view of the hearing apparatus is presented. After that, the physiology of the peripheral auditory system is described. The process of acoustic propagation through the outer, middle and inner ear, as well as the mechanism of transformation of cochlea inner hair cell motion into neuron spikes are explained. Next, Flanagan’s mathematical representation (based on physiological data acquired by von Békésy) of the passive relation between the sound that reaches the outer ear and the motion of the cochlea basilar membrane. Flanagan’s model is followed by Lyon’s model of the cochlea, Meddis’ model of the inner hair cell, and Patterson’s Auditory Image Model. Finally, the IPEM Toolbox is introduced as an example of music analysis system that incorporates an auditory model to perform acoustic analysis of sound based on human perception.

Specialist System for Clarinet Timber Detection (Portuguese)
abstract, pdf

It is presented bellow a method for automatic identification of the clarinet. The schema described bellow is based on the analysis of the decomposed acoustic signal using wavelets functions. For each sample of an instrument it is automatically identified the attack, sustain and release regions. In each of the regions a decomposition process is carried out and finally it is got as a result the mean and variance of the wavelets coefficients. Those are the input data of the fuzzy or neural network used in the identification of the clarinet. The experiments show it is possible to acquire a low level of error rate.

Blind Separation of Speech Convolutive Mixtures via Time-Frequency Masking
abstract, pdf

An ideal binary masking, which specifies regions in the time-frequency domain whose concerned signal energy is greater than the interference signals is analyzed. The performance of the signal separation when these ideal binary masks are applied is evaluated. In the tests, these ideal masks remove almost all the interference from the other source of convolutive mixtures using simulated room impulses. A method for blind signal separation in the time-frequency domain using only the relative amplitude information of each time-frequency cluster cells is proposed . In reverberant environment the proposed method can not identify the clusters, but we may find out that the referring attenuation values of each source are concentrated in the extremities of the curve of relative attenuation histogram. Experimental results show that our proposed method can separate signals with little interference from the other source even in a real reverberant environment.