Statistical Variation Analysis of Formant and Pitch Frequencies in Anger and Happiness Emotional Sentences in Farsi Language

Document Type : Research Article

Author

Assistant Professor, Department of Electrical Engineering, Shahid Beheshti University, Tehran, Iran

Abstract

Setup of an emotion recognition or emotional speech recognition system is directly related to how emotion changes the speech features. In this research, the influence of emotion on the anger and happiness was evaluated and the results were compared with the neutral speech. So the pitch frequency and the first three formant frequencies were used. The experimental results showed that there are logical and reasonable relations between the emotions and variations of these speech features. These results were also used to confirm our previous research about emotion recognition and emotional speech recognition.

Keywords


[1]
Rong, J., Li, G. and Chen, Y. P., “Acoustic Feature Selection for Automatic Emotion Recognition from Speech”, Information Processing and Management, 45 (3), pp. 315- 328, doi:10.1016/j.ipm.2008.09.003, 2009.
[2]
Batliner, A., Steidi, S., Schuller, B., Seppi, D., Vogt, T., Wagner, J., Devillers, L., Vidrascu, L., Aharonson, V., Kessous, L. and Amir, N., “Whodunnit- Searching for the Most Important Feature Types Signalling Emotion-Related User States in Speech”, Computer Speech and Language, 25(1), pp. 4- 28, doi:10.1016/j.csl.2009.12.003, 2010.
[3]
Polzehl, T., Schmitt, A., Metze F., and Wagner, M., “Anger Recognition in Speech Using Acoustic and
Linguistic Cues”, Speech Communication, 53 (9-10), pp. 1198- 1209, doi: 10.1016/j.specom2011.05.002, 2011.
[4]
Bozkurt, E., Erdem, C. E., Erdem, A. T. and Erzin, E., “Formant Position Based Weighted Spectral Features for Emotion Recognition”, Speech Communication Journal, 53(9-10), pp. 1186- 1197, 2011.
[5]
Petridis, S., and Pantic, M., “Audiovisual Discrimination between Laughter and Speech”, in Proc Int. Conf. on Acoustic, Speech and Signal Processing, pp. 5117- 5120, 2008.
[6]
Benzeghiba, M., De Mori, R., Deroo, O., Dupont, S., Erbes, T., Jouvet, D., Fissore, L., Laface, P., Mertins, A., Ris, C., Rose, R., Tyagi, V., and Wellekens, C., “Automatic Speech Recognition and Speech Variability, A Review”, Speech Communication, vol.49, pp. 763- 786, doi:10.1016/j.specom, 02, 006, 2007.
[7]
Zhang, C., Weijer, J. V. D., Cui, J., “Intra- and Inter-Speaker Variations of Formant Patter for Lateral Syllables in Standard Chinese”, Journal of Forensic Science International, 158 (2-3), pp. 117- 124, doi:101016/j.forsciint.2005.04.043, 2005.
[8]
Hagenaars, M. A., and Minnen, A. V., “The Effect of Fear on Paralinguistic Aspects of Speech in Patients with Panic Disorder with Agoraphobia”, In journal of Anxiety Disorder, 19(5), pp. 521- 537, 2005.
[9]
Lakshminarayanan, K., Shalom, D. B., Wassenhove, V. V., Orbelo, D., Houde, J., and Poeppel, D., “The Effect of Spectral Manipulations on the Identification of Affective and Linguistic Prosody”, Brain and Language, 84 (2), pp. 250- 263, 2003.
[10]
Steidl, S., Batliner, A., Seppi, D., and Schuller, B., “On the Impact of Children’s Emotional Speech on Acoustic and Language Models”, EURASIP Journal on Audio, Speech and Music Processing, doi:10.1155/2010/783954, 2010.
[11]
Toivanen, J., Vayrynen, E., and Seppanen, T., “Automatic Discrimination of Emotion from Spoken Finnish”, Language and Speech Journal, 47 (4), pp. 383- 412, doi:10.1177/00238309040470040301, 2004.
[12]
Pell , M. D., Paulmann, S., Dara, C., Alasseri, A., and Kotz, S. A., “Factors in the Recognition of Vocally Expressed Emotions: A Comparison of Four Languages”, Journal of Phonetics, 37 (4), pp. 417-436, 2009.
[13]
Jong, K. D., “Stress, Lexical focus, and segmental focus in English: Patterns of Variation in Vowel Duration”, Journal of Phonetics, 32 (4), pp. 493-516, 2004.
[14]
Gharavian, D., Sheikhzadeh, H. and Ahadi, S. M., “An Experimental Multi-Speaker Study on Farsi Phoneme Duration Rules Using Automatic Alignment”, in Proc. 8th Australian International Conference on Speech Science and Technology, pp. 186-191, 2000.
[15]
Gharavian, D. and Ahadi, S. M., “Statistical Evaluation of the Influence of Stress on Pitch Frequency and Phoneme Durations in Farsi Language”, in Proc 8th European Conference on Speech Communication and Technology, pp. 1- 4, 2003.
[16]
Gharavian, D. and Ahadi, S. M., “Evaluation of the Effect of Stress on Formants in Farsi Vowels”, in Proc. 2004 International Conference on Acoustics, Speech, and Signal Processing, pp. 661- 664, 2004.
[17]
Gharavian, D., “Prosody in Farsi Language and Its Use in Recognition of Intonation and Speech”, PhD Thesis, Elec. Eng. Dept., Amirkabir University, Tehran, 2004.
[18]
Gharavian, D. and Ahadi, S. M., “Use of Formants in Stressed and Unstressed Continuous Speech Recognition”, in Proc. 8th International Conference on Spoken Language Processing, pp. 1- 4, 2004.
[19]
Gharavian, D. and Ahadi, S.M., “Statistical Evaluation of Stress in Farsi and Its Effect on Vowel Pitch Frequencies, Durations and Energies”, Amirkabir Scientific Research Journal, 15 (58-A), pp. 258- 268, Spring, 2004.
[20]
Gharavian, D., Sheikhan, M. and Janipour, M., “Pitch in Emotional Speech and Emotional Speech Recognition Using Pitch Frequency”, Majlesi Journal of Electrical Engineering, 4(1), pp. 19- 24, 2010.
[21]
Gharavian, D. and Sheikhan, M., “Emotion Recognition and Emotion Spotting Improvement Using Formant-Related Features”, Majlesi Journal of Electrical Engineering, 4(1), pp. 1- 8, 2010.
[22]
Sheikhan, M., Gharavian, D. and Ashoftedel, F., “Using DTW-Neural Based MFCC Warping to Improve Emotional Speech Recognition”, Neural Computing and Applications Journal, 21 (7), pp. 1765- 1773, doi: 10.1007/s00521- 011- 0620- 8, 2012.
[23]
Gharavian, D., Sheikhan, M., Nazerieh, A. and Garoucy, S., “Speech Emotion Recognition Using FCBF Feature Selection Method and GA- Optimized Fuzzy ARTMAP Neural Network”, Neural Computing and Applications Journal, 21(8), pp. 1-12, doi:10.1007/s00521- 011- 0643- 1, 2011.
[24]
Gharavian, D. and Sheikhan, M., “GMM-Based Emotion Recognition in Farsi Language Using Feature Selection Algorithms”, World Applied Science Journal, 14(4), pp. 626- 638, 2011.
[25]
Gharavian, D., Sheikhan, M. and Ashoftedel, F., “Using Neutralized Formant Frequencies to Improve Emotional Speech Recognition”, IEICE Electronic Express, 8(14), pp. 1155- 1160, 2011.
[26]
Sheikhan, M., Bejani, M. and Gharavian, D., “Modular Neural-SVM Scheme for Speech Recognition Using ANOVA Feature Selection Method”, Neural Computing and Applications Journal, pp. 1-13. doi:10.1007/s00521- 012- 0814- 8, 2012.
[27]
Gharavian, D., Sheikhan, M. and Ashoftedel, F., “Emotion Recognition Improvement Using Normalized Formant Supplementary Features by Hybrid of DTW-MLP-GMM”, Neural Computing and Applications Journal, pp. 1-11, doi: 10.1007/s00521- 012- 0884- 7, 2012.
[28]
Bijankhan, M., Sheikhzadegan, J., Roohani, M. R., Samareh, Y., Lucas, C. and Tebiani, M., “The Speech Database of Farsi Spoken Language”, in Proc. 1994 5th Australian Int. Conf. on Speech Science and Technology, pp. 826- 83, 1994.
[29]
Young, S. J., Evermann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V. and Woodland, P., The HTK Book (ver 3.2), Cambridge University Eng. Dept, 2002.