The main aim of the essay is to examine three philosophical narrations. One of them, Hegel’s master-slave dialectic, clearly inspired the other two, that is: Marx’s reflections in his Economic and Philosophic Manuscripts of 1844 and the interpretation of the Odyssey in Horkheimer and Adorno’s Dialectic of Enlightenment. Whereas Hegel’s dialectic opens a perspective of mutual recognition of individuals, permanently codified in their fundamental rights, the two remaining narrations lead to totally different conclusions. According to young Marx, the subjects not only do not recognize themselves mutually but even, under the influence of economic relationships, treat each other with disregard. Also in Adorno and Horkehimer’s view the labor processes, which according to Hegel led towards the freedom of individuals, distort interpersonal relations and strengthen the growing coercion. At the end, the proposal of Jürgen Habermas is taken into consideration. He argues that communication acts instead of labor processes are the real emancipating factor.
Philosophers are motivated to do research concerning pattern recognition because of wide range of its applications. One of the pathfi nders of research in that area was Satosi Watanabe, who has been frequently commented in the literature concerning this subject. The rule of decrease in entropy and the rule of simplicity are described in the context of pattern recognition. Although the concept of entropy had been initially used in the area of thermodynamics, it could be adopted also in the fi eld of pattern recognition. The concept of entropy should be then suitable transformed. A few of examples of the entropy concept application and the relationship between entropy and simplicity are discussed in the article. Simplicity considered by Watanabe should be treated mainly as polynomial curve simplicity, however the issue is described in the wider context.
The human voice is one of the basic means of communication, thanks to which one also can easily convey the emotional state. This paper presents experiments on emotion recognition in human speech based on the fundamental frequency. AGH Emotional Speech Corpus was used. This database consists of audio samples of seven emotions acted by 12 different speakers (6 female and 6 male). We explored phrases of all the emotions – all together and in various combinations. Fast Fourier Transformation and magnitude spectrum analysis were applied to extract the fundamental tone out of the speech audio samples. After extraction of several statistical features of the fundamental frequency, we studied if they carry information on the emotional state of the speaker applying different AI methods. Analysis of the outcome data was conducted with classifiers: K-Nearest Neighbours with local induction, Random Forest, Bagging, JRip, and Random Subspace Method from algorithms collection for data mining WEKA. The results prove that the fundamental frequency is a prospective choice for further experiments.
Speech emotion recognition is an important part of human-machine interaction studies. The acoustic analysis method is used for emotion recognition through speech. An emotion does not cause changes on all acoustic parameters. Rather, the acoustic parameters affected by emotion vary depending on the emotion type. In this context, the emotion-based variability of acoustic parameters is still a current field of study. The purpose of this study is to investigate the acoustic parameters that fear affects and the extent of their influence. For this purpose, various acoustic parameters were obtained from speech records containing fear and neutral emotions. The change according to the emotional states of these parameters was analyzed using statistical methods, and the parameters and the degree of influence that the fear emotion affected were determined. According to the results obtained, the majority of acoustic parameters that fear affects vary according to the used data. However, it has been demonstrated that formant frequencies, mel-frequency cepstral coefficients, and jitter parameters can define the fear emotion independent of the data used.
The article reports three experiments conducted to determine whether musicians possess better ability of recognising the sources of natural sounds than non-musicians. The study was inspired by reports which indicate that musical training develops not only musical hearing, but also enhances various non-musical auditory capabilities. Recognition and detection thresholds were measured for recordings of environmental sounds presented in quiet (Experiment 1) and in the background of a noise masker (Experiment 2). The listener’s ability of sound source recognition was inferred from the recognition-detection threshold gap (RDTG) defined as the difference in signal level between the thresholds of sound recognition and sound detection. Contrary to what was expected from reports of enhanced auditory abilities of musicians, the RDTGs were not smaller for musicians than for non-musicians. In Experiment 3, detection thresholds were measured with an adaptive procedure comprising three interleaved stimulus tracks with different sounds. It was found that the threshold elevation caused by stimulus interleaving was similar for musicians and non-musicians. The lack of superiority of musicians over non-musicians in the auditory tasks explored in this study is explained in terms of a listening strategy known as casual listening mode, which is a basis for auditory orientation in the environment.
Biometric identification systems, i.e. the systems that are able to recognize humans by analyzing their physiological or behavioral characteristics, have gained a lot of interest in recent years. They can be used to raise the security level in certain institutions or can be treated as a convenient replacement for PINs and passwords for regular users. Automatic face recognition is one of the most popular biometric technologies, widely used even by many low-end consumer devices such as netbooks. However, even the most accurate face identification algorithm would be useless if it could be cheated by presenting a photograph of a person instead of the real face. Therefore, the proper liveness measurement is extremely important. In this paper we present a method that differentiates between video sequences showing real persons and their photographs. First we calculate the optical flow of the face region using the Farnebäck algorithm. Then we convert the motion information into images and perform the initial data selection. Finally, we apply the Support Vector Machine to distinguish between real faces and photographs. The experimental results confirm that the proposed approach could be successfully applied in practice.
This study examined whether differences in reverberation time (RT) between typical sound field test rooms used in audiology clinics have an effect on speech recognition in multi-talker environments. Separate groups of participants listened to target speech sentences presented simultaneously with 0-to-3 competing sentences through four spatially-separated loudspeakers in two sound field test rooms having RT = 0:6 sec (Site 1: N = 16) and RT = 0:4 sec (Site 2: N = 12). Speech recognition scores (SRSs) for the Synchronized Sentence Set (S3) test and subjective estimates of perceived task difficulty were recorded. Obtained results indicate that the change in room RT from 0.4 to 0.6 sec did not significantly influence SRSs in quiet or in the presence of one competing sentence. However, this small change in RT affected SRSs when 2 and 3 competing sentences were present, resulting in mean SRSs that were about 8-10% better in the room with RT = 0:4 sec. Perceived task difficulty ratings increased as the complexity of the task increased, with average ratings similar across test sites for each level of sentence competition. These results suggest that site-specific normative data must be collected for sound field rooms if clinicians would like to use two or more directional speech maskers during routine sound field testing.
A phoneme segmentation method based on the analysis of discrete wavelet transform spectra is described. The localization of phoneme boundaries is particularly useful in speech recognition. It enables one to use more accurate acoustic models since the length of phonemes provide more information for parametrization. Our method relies on the values of power envelopes and their first derivatives for six frequency subbands. Specific scenarios that are typical for phoneme boundaries are searched for. Discrete times with such events are noted and graded using a distribution-like event function, which represent the change of the energy distribution in the frequency domain. The exact definition of this method is described in the paper. The final decision on localization of boundaries is taken by analysis of the event function. Boundaries are, therefore, extracted using information from all subbands. The method was developed on a small set of Polish hand segmented words and tested on another large corpus containing 16 425 utterances. A recall and precision measure specifically designed to measure the quality of speech segmentation was adapted by using fuzzy sets. From this, results with F-score equal to 72.49% were obtained.
The paper presents application of differential electronic nose in the dynamic (on-line) volatile measurement. First we compare the classical nose employing only one sensor array and its extension in the differential form containing two sensor arrays working in differential mode. We show that differential nose performs better at changing environmental conditions, especially the temperature, and well performs in the dynamic mode of operation. We show its application in recognition of different brands of tobacco
A variety of algorithms allows gesture recognition in video sequences. Alleviating the need for interpreters is of interest to hearing impaired people, since it allows a great degree of self-sufficiency in communicating their intent to the non-sign language speakers without the need for interpreters. State-of-theart in currently used algorithms in this domain is capable of either real-time recognition of sign language in low resolution videos or non-real-time recognition in high-resolution videos. This paper proposes a novel approach to real-time recognition of fingerspelling alphabet letters of American Sign Language (ASL) in ultra-high-resolution (UHD) video sequences. The proposed approach is based on adaptive Laplacian of Gaussian (LoG) filtering with local extrema detection using Features from Accelerated Segment Test (FAST) algorithm classified by a Convolutional Neural Network (CNN). The recognition rate of our algorithm was verified on real-life data.
Perception takes into account the costs and benefits of possible interpretations of incoming sensory data. This should be especially pertinent for threat recognition, where minimising the costs associated with missing a real threat is of primary importance. We tested whether recognition of threats has special characteristics that adapt this process to the task it fulfils. Participants were presented with images of threats and visually matched neutral stimuli, distorted by varying levels of noise. We found threat superiority effect and liberal response bias. Moreover, increasing the level of noise degraded the recognition of the neutral images to higher extent than the threatening images. To summarise, recognising threats is special, in that it is more resistant to noise and decline in stimulus quality, suggesting that threat recognition is a fast ‘all or nothing’ process, in which threat presence is either confirmed or negated.
Five models and methodology are discussed in this paper for constructing classifiers capable of recognizing in real time the type of fuel injected into a diesel engine cylinder to accuracy acceptable in practical technical applications. Experimental research was carried out on the dynamic engine test facility. The signal of in-cylinder and in-injection line pressure in an internal combustion engine powered by mineral fuel, biodiesel or blends of these two fuel types was evaluated using the vibro-acoustic method. Computational intelligence methods such as classification trees, particle swarm optimization and random forest were applied.
Keypoint detection is a basic step in many computer vision algorithms aimed at recognition of objects, automatic navigation and analysis of biomedical images. Successful implementation of higher level image analysis tasks, however, is conditioned by reliable detection of characteristic image local regions termed keypoints. A large number of keypoint detection algorithms has been proposed and verified. In this paper we discuss the most important keypoint detection algorithms. The main part of this work is devoted to description of a keypoint detection algorithm we propose that incorporates depth information computed from stereovision cameras or other depth sensing devices. It is shown that filtering out keypoints that are context dependent, e.g. located at boundaries of objects can improve the matching performance of the keypoints which is the basis for object recognition tasks. This improvement is shown quantitatively by comparing the proposed algorithm to the widely accepted SIFT keypoint detector algorithm. Our study is motivated by a development of a system aimed at aiding the visually impaired in space perception and object identification.
Development of facial recognition or expression recognition algorithms requires input data to thoroughly test the performance of algorithms in various conditions. Researchers are developing various methods to face challenges like illumination, pose and expression changes, as well as facial disguises. In this paper, we propose and establish a dataset of thermal facial images, which contains a set of neutral images in various poses as well as a set of facial images with different posed expressions collected with a thermal infrared camera. Since the properties of face in the thermal domain strongly depend on time, in order to show the impact of aging, collection of the dataset has been repeated and a corresponding set of data is provided. The paper describes the measurement methodology and database structure. We present baseline results of processing using state-of-the-art facial descriptors combined with distance metrics for thermal face reidentification. Three selected local descriptors, a histogram of oriented gradients, local binary patterns and local derivative patterns are used for elementary assessment of the database. The dataset offers a wide range of capabilities – from thermal face recognition to thermal expression recognition.
This paper concerns measurement procedures on an emotion monitoring stand designed for tracking human emotions in the Human-Computer Interaction with physiological characteristics. The paper addresses the key problem of physiological measurements being disturbed by a motion typical for human-computer interaction such as keyboard typing or mouse movements. An original experiment is described, that aimed at practical evaluation of measurement procedures performed at the emotion monitoring stand constructed at GUT. Different locations of sensors were considered and evaluated for suitability and measurement precision in the Human- Computer Interaction monitoring. Alternative locations (ear lobes and forearms) for skin conductance, blood volume pulse and temperature sensors were proposed and verified. Alternative locations proved correlation with traditional locations as well as lower sensitiveness to movements like typing or mouse moving, therefore they can make a better solution for monitoring the Human-Computer Interaction.
This article explores the role of recognition in State creation. Basing on an analysis of relations between effectiveness and legality in the process of State creation, it claims that recognition is constitutive of statehood as a subject of international law. The research revolves around the following themes: the role of effectiveness criteria and the conditions of recognition set by international law, the existence of “statehood without effectiveness” in cases of limited effectiveness but general recognition, the study of acquisition of statehood as a process and the notion of collective recognition based on the cases of Kosovo and Palestine. The argumentation is also supported by the analysis of de facto entities and aspiring States in international practice. It draws on the distinction between legal non-recognition and political non-recognition as able to shed some light on the complexity of international practice in this area. The article concludes that recognition is a pre-requisite of statehood, an essential criterion that may overcome weak effectiveness in certain legal contexts, though not a lack of independence. Conversely, effectiveness of government authority over population and territory does not lead to statehood in the meaning of international law in the absence of international recognition.
Affective computing studies and develops systems capable of detecting humans affects. The search for universal well-performing features for speech-based emotion recognition is ongoing. In this paper, a small set of features with support vector machines as the classifier is evaluated on Surrey Audio-Visual Expressed Emotion database, Berlin Database of Emotional Speech, Polish Emotional Speech database and Serbian emotional speech database. It is shown that a set of 87 features can offer results on-par with state-of-the-art, yielding 80.21, 88.6, 75.42 and 93.41% average emotion recognition rate, respectively. In addition, an experiment is conducted to explore the significance of gender in emotion recognition using random forests. Two models, trained on the first and second database, respectively, and four speakers were used to determine the effects. It is seen that the feature set used in this work performs well for both male and female speakers, yielding approximately 27% average emotion recognition in both models. In addition, the emotions for female speakers were recognized 18% of the time in the first model and 29% in the second. A similar effect is seen with male speakers: the first model yields 36%, the second 28% a verage emotion recognition rate. This illustrates the relationship between the constitution of training data and emotion recognition accuracy.
Speaker‘s emotional states are recognized from speech signal with Additive white Gaussian noise (AWGN). The influence of white noise on a typical emotion recogniztion system is studied. The emotion classifier is implemented with Gaussian mixture model (GMM). A Chinese speech emotion database is used for training and testing, which includes nine emotion classes (e.g. happiness, sadness, anger, surprise, fear, anxiety, hesitation, confidence and neutral state). Two speech enhancement algorithms are introduced for improved emotion classification. In the experiments, the Gaussian mixture model is trained on the clean speech data, while tested under AWGN with various signal to noise ratios (SNRs). The emotion class model and the dimension space model are both adopted for the evaluation of the emotion recognition system. Regarding the emotion class model, the nine emotion classes are classified. Considering the dimension space model, the arousal dimension and the valence dimension are classified into positive regions or negative regions. The experimental results show that the speech enhancement algorithms constantly improve the performance of our emotion recognition system under various SNRs, and the positive emotions are more likely to be miss-classified as negative emotions under white noise environment.
This paper describes research behind a Large-Vocabulary Continuous Speech Recognition (LVCSR) system for the transcription of Senate speeches for the Polish language. The system utilizes severalcomponents: a phonetic transcription system, language and acoustic model training systems, a Voice Activity Detector (VAD), a LVCSR decoder, and a subtitle generator and presentation system. Some of the modules relied on already available tools and some had to be made from the beginning but the authors ensured that they used the most advanced techniques they had available at the time. Finally, several experiments were performed to compare the performance of both more modern and more conventional technologies.
In this paper, a new feature-extraction method is proposed to achieve robustness of speech recognition systems. This method combines the benefits of phase autocorrelation (PAC) with bark wavelet transform. PAC uses the angle to measure correlation instead of the traditional autocorrelation measure, whereas the bark wavelet transform is a special type of wavelet transform that is particularly designed for speech signals. The extracted features from this combined method are called phase autocorrelation bark wavelet transform (PACWT) features. The speech recognition performance of the PACWT features is evaluated and compared to the conventional feature extraction method mel frequency cepstrum coefficients (MFCC) using TI-Digits database under different types of noise and noise levels. This database has been divided into male and female data. The result shows that the word recognition rate using the PACWT features for noisy male data (white noise at 0 dB SNR) is 60%, whereas it is 41.35% for the MFCC features under identical conditions
This paper describes a Deep Belief Neural Network (DBNN) and Bidirectional Long-Short Term Memory (LSTM) hybrid used as an acoustic model for Speech Recognition. It was demonstrated by many independent researchers that DBNNs exhibit superior performance to other known machine learning frameworks in terms of speech recognition accuracy. Their superiority comes from the fact that these are deep learning networks. However, a trained DBNN is simply a feed-forward network with no internal memory, unlike Recurrent Neural Networks (RNNs) which are Turing complete and do posses internal memory, thus allowing them to make use of longer context. In this paper, an experiment is performed to make a hybrid of a DBNN with an advanced bidirectional RNN used to process its output. Results show that the use of the new DBNN-BLSTM hybrid as the acoustic model for the Large Vocabulary Continuous Speech Recognition (LVCSR) increases word recognition accuracy. However, the new model has many parameters and in some cases it may suffer performance issues in real-time applications.
The presented article points to the issues of self-consciousness and the possibilities of its development. It defines in this context also concepts of self-evaluation, self-respect, self-appreciation, self-recognition, self-confidence and self-realization. In the text, it is emphasized that self-consciousness is related to the awareness of one´s own psychophysical and social identity - I myself and the world and my place in it. An important means for the development of the healthy self-consciousness is also praise. In the conclusion of the article, attention is paid to the psycho-hygiene as prevention of the failure.