The paper investigates the interdependence between the perceptual identification of the vocalic quality of six isolated Polish vowels traditionally defined by the spectral envelope and the fundamental frequency F0. The stimuli used in the listening experiments were natural female and male voices, which were modified by changing the F0 values in the ±1 octave range. The results were then compared with the outcome of the experiments on fully synthetic voices. Despite the differences in the generation of the investigated stimuli and their technical quality, consistent results were obtained. They confirmed the findings that in the perceptual identification of vowels of key importance is not only the position of the formants on the F1 × F2 plane but also their relationship to F0, the connection between the formants and the harmonics and other factors. The paper presents, in quantitative terms, all possible kinds of perceptual shifts of Polish vowels from one phonetic category to another in the function of voice pitch. An additional perceptual experiment was also conducted to check a broader range of F0 changes and their impact on the identification of vowels in CVC (consonant, vowel, consonant) structures. A mismatch between the formants and the glottal tone value can lead to a change in phonetic category.
The present study consisted of two experiments. The goal of the first experiment was to establish the just noticeable differences for the fundamental frequency of the vowel /u/ by using the 2AFC method. We obtained the threshold value for 27 cents. This value is larger than the motor reaction values which had been observed in previous experiments (e.g. 9 or 19 cents). The second experiment was intended to provide neurophysiological confirmation of the detection of shifts in a frequency, using event-related potentials (ERPs). We concentrated on the mismatch negativity (MMN) - the component elicited by the change in the pattern of stimuli. Its occurrence is correlated with the discrimination threshold. In our study, MMN was observed for changes greater than 27 cents - shifts of ±50 and 100 cents (effect size - Cohen’s d = 2.259). MMN did not appear for changes of ±10 and 20 cents. The results showed that the values for which motor responses can be observed are indeed lower than those for perceptual thresholds.
This paper proposes a speech enhancement method using the multi-scales and multi-thresholds of the auditory perception wavelet transform, which is suitable for a low SNR (signal to noise ratio) environment. This method achieves the goal of noise reduction according to the threshold processing of the human ear's auditory masking effect on the auditory perception wavelet transform parameters of a speech signal. At the same time, in order to prevent high frequency loss during the process of noise suppression, we first make a voicing decision based on the speech signals. Afterwards, we process the unvoiced sound segment and the voiced sound segment according to the different thresholds and different judgments. Lastly, we perform objective and subjective tests on the enhanced speech. The results show that, compared to other spectral subtractions, our method keeps the components of unvoiced sound intact, while it suppresses the residual noise and the background noise. Thus, the enhanced speech has better clarity and intelligibility.