首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
Neural Computing and Applications - In this work, we conducted an empirical comparative study of the performance of text-independent speaker verification in emotional and stressful environments....  相似文献   

3.
This work aims at investigating and analyzing speaker identification in each unbiased and biased emotional talking environments based on a classifier called Suprasegmental Hidden Markov Models (SPHMMs). The first talking environment is unbiased towards any emotion, while the second talking environment is biased towards different emotions. Each of these talking environments is made up of six distinct emotions. These emotions are neutral, angry, sad, happy, disgust and fear. The investigation and analysis of this work show that speaker identification performance in the biased talking environment is superior to that in the unbiased talking environment. The obtained results in this work are close to those achieved in subjective assessment by human judges.  相似文献   

4.
Speaker recognition systems perform almost ideal in neutral talking environments; however, these systems perform poorly in emotional talking environments. This research is devoted to enhancing the low performance of text-independent and emotion-dependent speaker identification in emotional talking environments based on employing Second-Order Circular Suprasegmental Hidden Markov Models (CSPHMM2s) as classifiers. This work has been tested on our speech database which is composed of 50 speakers talking in six different emotional states. These states are neutral, angry, sad, happy, disgust, and fear. Our results show that the average speaker identification performance in these talking environments based on CSPHMM2s is 81.50% with an improvement rate of 5.61%, 3.39%, and 3.06% compared, respectively, to First-Order Left-to-Right Suprasegmental Hidden Markov Models (LTRSPHMM1s), Second-Order Left-to-Right Suprasegmental Hidden Markov Models (LTRSPHMM2s), and First-Order Circular Suprasegmental Hidden Markov Models (CSPHMM1s). Our results based on subjective evaluation by human judges fall within 2.26% of those obtained based on CSPHMM2s.  相似文献   

5.
6.
7.
提出了一种基于情感语音的差异检测与剔除的说话人识别方法,克服了前人的方法中需要在训练时提供测试说话人的情感语音或者需要在测试时提供测试语音的情感状态信息给系统的使用带来的不便性,并在识别性能上比传统的ASR系统提高4.7%。  相似文献   

8.
As communication technologies continue to evolve, more people will engage in virtual social interactions. With this trend comes an increasing need for research on behavior within virtual worlds. This study contributes to that agenda by focusing on the influence of physical attributes of a virtual setting and gender on verbal behavior expressed by mixed-gender dyads in a virtual world. Computerized text analyses revealed linguistic differences as a function of both the physical and social complexity of virtual settings and gender. The latter differences included both quantitative and qualitative features of written communication. These results add important new discoveries to the literature on virtual psychology and highlight the value of using text analysis tools to investigate virtual interactions.  相似文献   

9.
OBJECTIVE: Existing reports suggest that males significantly outperform females in navigating 3-D virtual environments. Although researchers have recognized that this may be attributable to males and females possessing different spatial abilities, most work has attempted to reduce the gender gap by providing more training for females. In this paper, we explore using large displays to narrow the gender gap within these tasks. BACKGROUND: While evaluating various interaction techniques, we found that large displays affording wider fields of view seemed to improve virtual navigation performance in general and, additionally, to narrow the gender gap that existed on standard desktop displays. METHOD: We conducted two experiments (32 and 22 participants) exploring the individual contributions of display and geometric fields of view to the observed effects as well as isolating factors explaining performance increases seen on the large displays. RESULTS: We show that wider fields of view on large displays not only increase performance of all users on average but also benefit females to such a degree as to allow them to perform as well as males do. We further demonstrate that these benefits can be attributed to better optical flow cues offered by the large displays. CONCLUSION: These findings provide a significant contribution, including recommendations for the improved presentation of 3-D environments, backed by empirical data demonstrating performance benefits during navigation tasks. APPLICATION. Results can be used to design systems that narrow the gender gap in domains such as teleoperation and virtual environments for entertainment, virtual training, or information visualization.  相似文献   

10.
In this study, strategies involving use of message relevance and formatting cues were tested with the objective of enhancing media multitasking performance. Three memory measures, free recall, aided/cued recall, and recognition were used as dependent variables for the study.  相似文献   

11.
The general aim of the contributions to this special issue was to foster learning in computer supported collaborative learning environments by designing instructional interventions that enhance collaboration between learners. Scripts and external representations were used as instructional interventions to support social and cognitive processes, respectively, during collaborative learning. Although, the interventions enhanced these social and cognitive processes, beneficial effects on learning outcomes were not always found. This discussion uses cognitive load theory, particularly the expertise reversal effect, to explain these results. It is concluded that the principles from this theory which pertains to individual learning, show great promise for the design of collaborative learning environments.  相似文献   

12.
Traditional emotion models, when tagging single emotions in documents, often ignore the fact that most documents convey complex human emotions. In this paper, we join emotion analysis with topic models to find complex emotions in documents, as well as the intensity of the emotions, and study how the document emotions vary with topics. Hierarchical Bayesian networks are employed to generate the latent topic variables and emotion variables. On average, our model on single emotion classification outperforms the traditional supervised machine learning models such as SVM and Naive Bayes. The other model on the complex emotion classification also achieves promising results. We thoroughly analyze the impact of vocabulary quality and topic quantity to emotion and intensity prediction in our experiments. The distribution of topics such as Friend and Job are found to be sensitive to the documents’ emotions, which we call emotion topic variation in this paper. This reveals the deeper relationship between topics and emotions.  相似文献   

13.
In classification tasks, the error rate is proportional to the commonality among classes. In conventional GMM-based modeling technique, since the model parameters of a class are estimated without considering other classes in the system, features that are common across various classes may also be captured, along with unique features. This paper proposes to use unique characteristics of a class at the feature-level and at the phoneme-level, separately, to improve the classification accuracy. At the feature-level, the performance of a classifier has been analyzed by capturing the unique features while modeling, and removing common feature vectors during classification. Experiments were conducted on speaker identification task, using speech data of 40 female speakers from NTIMIT corpus, and on a language identification task, using speech data of two languages (English and French) from OGI_MLTS corpus. At the phoneme-level, performance of a classifier has been analyzed by identifying a subset of phonemes, which are unique to a speaker with respect to his/her closely resembling speaker, in the acoustic sense, on a speaker identification task. In both the cases (feature-level and phoneme-level) considerable improvement in classification accuracy is observed over conventional GMM-based classifiers in the above mentioned tasks. Among the three experimental setup, speaker identification task using unique phonemes shows as high as 9.56 % performance improvement over conventional GMM-based classifier.  相似文献   

14.
Open learning environments, such as Massive Open Online Courses (MOOCs), often lack adequate learner collaboration opportunities; they are also plagued by high levels of drop-out. Introducing project-based learning (PBL) can enhance learner collaboration and motivation, but PBL does not easily scale up into MOOCS. To support definition and staffing of projects, team formation principles and algorithms are introduced to form productive, creative, or learning teams. These use data on the project and on learner knowledge, personality and preferences. A study was carried out to validate the principles and the algorithms. Students (n = 168) and educational practitioners (n = 56) provided the data. The principles for learning teams and productive teams were accepted, while the principle for creative teams could not. The algorithms were validated using team classifying tasks and team ranking tasks. The practitioners classify and rank small productive, creative and learning teams in accordance with the algorithms, thereby validating the algorithms outcomes. When team size grows, for practitioners, forming teams quickly becomes complex, as demonstrated by the increased divergence in ranking and classifying accuracy. Discussion of the results, conclusions, and directions for future research are provided.  相似文献   

15.
The performance of state-of-the-art speaker verification in uncontrolled environment is affected by different variabilities. Short duration variability is very common in these scenarios and causes the speaker verification performance to decrease quickly while the duration of verification utterances decreases. Linear discriminant analysis (LDA) is the most common session variability compensation algorithm, nevertheless it presents some shortcomings when trained with insufficient data. In this paper we introduce two methods for session variability compensation to deal with short-length utterances on i-vector space. The first method proposes to incorporate the short duration variability information in the within-class variance estimation process. The second proposes to compensate the session and short duration variabilities in two different spaces with LDA algorithms (2S-LDA). First, we analyzed the behavior of the within and between class scatters in the first proposed method. Then, both proposed methods are evaluated on telephone session from NIST SRE-08 for different duration of the evaluation utterances: full (average 2.5 min), 20, 15, 10 and 5 s. The 2S-LDA method obtains good results on different short-length utterances conditions in the evaluations, with a EER relative average improvement of 1.58%, compared to the best baseline (WCCN[LDA]). Finally, we applied the 2S-LDA method in speaker verification under reverberant environment, using different reverberant conditions from Reverb challenge 2013, obtaining an improvement of 8.96 and 23% under matched and mismatched reverberant conditions, respectively.  相似文献   

16.
This research assessed how emotive animated agents in a simulation‐based training affect the performance outcomes and perceptions of the individuals interacting in real time with the training application. A total of 56 participants consented to complete the study. The material for this investigation included a nursing simulation in which participants interacted with three animated agents. The results of this investigation indicated that both experienced and novice participants focused more visual attention time on the body of the animated agent than the other defined areas of interest in the simulated environment. The results also indicated that novice participants conveyed more neutral facial expressions during the interaction with the animated agents than experience participants. The results of the simulation performance scores indicated that novice participants achieved higher simulation performance scores on the simulation task than experienced participants. Lastly, the results of the agent persona instrument showed that experienced and novice participants perceived the animated agents as facilitators of learning, credible, human‐like and engaging.  相似文献   

17.
Managing the layout of multi-dimensional visualizations is a crucial concern for the development of effective visual analytic interfaces. In these environments, heterogeneous and multi-dimensional information must be structured and combined into data representations that demand low cognitive resources but yield accurate mental models and insights. In this paper, we use Information-Rich Virtual Environments (IRVE) to articulate crucial tradeoffs in the use of Depth and Gestalt cues in text label layouts. We present a design space and evaluation methodology to explore the usability effects of these tradeoffs and collect results from a series of user studies. These lessons are posed as a set of design guidelines to aid developers of new, advantageous interfaces and specifications.  相似文献   

18.
Multimodal identification and tracking in smart environments   总被引:1,自引:0,他引:1  
We present a model for unconstrained and unobtrusive identification and tracking of people in smart environments and answering queries about their whereabouts. Our model supports biometric recognition based upon multiple modalities such as face, gait, and voice in a uniform manner. The key technical idea underlying our approach is to abstract a smart environment by a state transition system in which each state records a set of individuals who are present in various zones of the environment. Since biometric recognition is inexact, state information is inherently probabilistic in nature. An event abstracts a biometric recognition step, and the transition function abstracts the reasoning necessary to effect state transitions. In this manner, we are able to integrate different biometric modalities uniformly and also different criteria for state transitions. Fusion of biometric modalities is also supported by our model. We define performance metrics for a smart environment in terms of the concepts of ‘precision’ and ‘recall’. We have developed a prototype implementation of our proposed concepts and provide experimental results in this paper. Our conclusion is that the state transition model is an effective abstraction of a smart environment and serves as a good basis for developing practical systems.  相似文献   

19.
This paper presents a simplified and supervised i-vector modeling approach with applications to robust and efficient language identification and speaker verification. First, by concatenating the label vector and the linear regression matrix at the end of the mean supervector and the i-vector factor loading matrix, respectively, the traditional i-vectors are extended to label-regularized supervised i-vectors. These supervised i-vectors are optimized to not only reconstruct the mean supervectors well but also minimize the mean square error between the original and the reconstructed label vectors to make the supervised i-vectors become more discriminative in terms of the label information. Second, factor analysis (FA) is performed on the pre-normalized centered GMM first order statistics supervector to ensure each gaussian component's statistics sub-vector is treated equally in the FA, which reduces the computational cost by a factor of 25 in the simplified i-vector framework. Third, since the entire matrix inversion term in the simplified i-vector extraction only depends on one single variable (total frame number), we make a global table of the resulting matrices against the frame numbers’ log values. Using this lookup table, each utterance's simplified i-vector extraction is further sped up by a factor of 4 and suffers only a small quantization error. Finally, the simplified version of the supervised i-vector modeling is proposed to enhance both the robustness and efficiency. The proposed methods are evaluated on the DARPA RATS dev2 task, the NIST LRE 2007 general task and the NIST SRE 2010 female condition 5 task for noisy channel language identification, clean channel language identification and clean channel speaker verification, respectively. For language identification on the DARPA RATS, the simplified supervised i-vector modeling achieved 2%, 16%, and 7% relative equal error rate (EER) reduction on three different feature sets and sped up by a factor of more than 100 against the baseline i-vector method for the 120 s task. Similar results were observed on the NIST LRE 2007 30 s task with 7% relative average cost reduction. Results also show that the use of Gammatone frequency cepstral coefficients, Mel-frequency cepstral coefficients and spectro-temporal Gabor features in conjunction with shifted-delta-cepstral features improves the overall language identification performance significantly. For speaker verification, the proposed supervised i-vector approach outperforms the i-vector baseline by relatively 12% and 7% in terms of EER and norm old minDCF values, respectively.  相似文献   

20.
Video-game users represent 40% of the French population and adolescents are the primary users. Yet excessive playing of video games has become a problem in modern society and is manifesting itself in treatment centers for adolescents. Before attempting to gain insight into this problematic use, we must understand video gaming itself and its implications for the gamer. The aim of this research is to propose an understanding of video-game playing based on some dimensions of emotional functioning such as emotion regulation, emotion intensity, emotion expression, and alexithymia. A total of 159 adolescents took part in the study. Regular gamers regulated their emotions more than irregular gamers did. They also felt their emotions more intensely. But regular gamers expressed their emotions less than irregular gamers did. Finally, the regular gamers' alexithymia level was higher than the irregular gamers' level. Especially, they had more difficulty being emotionally reactive. The avatar's evolution in the virtual environment may help mediate adolescents' problematic emotional experiences to give them meaning and enable their appropriation. As such, video games may act as a medium for projecting and experiencing one's emotional life by staging the emotional self, thereby explaining the engagement of adolescents in video gaming.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号