Contents: 2017 | 2016 | 2015 | 2014 | 2013 | 2012 | 2011 | 2010 | 2009 | 2008 | 2007 | 2006 | 2005 | 2004 | 2003 | 2002 | 2001

2005, 14

A. Kurematsu, M. Nakano-Miyatake, H. Perez-Meana, E. Simancas-Acevedo

Performance analysis of Gaussian Mixture Model speaker recognition systems with different speaker features

language: English

received 28.03.2005, published 14.05.2005

Download article (PDF, 310 kb, ZIP), use browser command "Save Target As..."
To read this document you need Adobe Acrobat © Reader software, which is simple to use and available at no cost. Use version 4.0 or higher. You can download software from Adobe site (http://www.adobe.com/).

ABSTRACT

This paper analyzes the effect of the speaker feature vector characteristics, in the performance of speaker recognition systems (SRS) based on the Gaussian Mixture Model (GMM). To this end, the performance of the SRS is analized using speaker features derived from: a) linear predictive cepstral coefficients (LPCepstral) extracted from the whole speech frame, b) LPCepstral derived from the voiced parts of the speech frame, c) LPCepstral extracted from voiced segments of speech frame together with the pitch information, d) LPCepstral extracted from voiced segments of each frame normalized using a Cepstral Mean Normalization (CMN). Evaluation results, using phrases of 2.5–3 second of telephone speech utterances in Japanese language, show that a fairly good performance of GMM-based SRS is achieved with most speaker features vectors with both, close-test as well as with open-test, although the features vector providing the best recognition performance closely depends on each particular speaker.

16 pages, 8 figures

Сitation: A. Kurematsu, M. Nakano-Miyatake, H. Perez-Meana, E. Simancas-Acevedo. Performance analysis of Gaussian Mixture Model speaker recognition systems with different speaker features. Electronic Journal “Technical Acoustics”, http://www.ejta.org, 2005, 14.

REFERENCES

[1] E. Simancas-Acevedo, A. Kurematsu, M. Nakano-Miyatake, H. Perez-Meana. Speaker Recognition Using Gaussian Mixtures Model. Lecture Notes in Computer Science, Bio-Inspired Applications of Connectionism, Springer Verlag, Berlin, 2001, 287–294.
[2] H. A. Murthy, F. Beaufays, L. P. Heck, M. Weintraub. Robust Text-Independent Speaker Identification over Telephone Channels. IEEE Transactions on Speech and Audio Processing, vol. 7, N°5, September 1999.
[3] D. A. Reynolds. Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models. IEEE Transactions on Speech and Audio Processing, vol. 3, N°1, 72–83, January 1995.
[4] S. Van Vuren. Comparison of Text-Independent Speaker Recognition Methods on Telephone Speech with Acoustic Mismatch. Oregon Graduate Institute of Science & Technology Center for Spoken Language Understanding, 20000 N.W. Walker Road, Beaverton, Oregon 97006 USA.
[5] J. P. Campbell. Speaker Recognition: A Tutorial. Proceedings of the IEEE, vol. 85, N°9, 1437–1462, Sept. 1997.
[6] H. K. Kim, H. S. Lee. Use of Spectral Autocorrelation in Spectral Envelope Linear Prediction for Speech Recognition. IEEE Transactions on Speech and Audio Processing, vol. 7, N°5, September 1999.
[7] T. Ganchev, A. Tsopanoglou, N. Fakotakis, G. Kokkinakis. Probabilistic Neural Networks Combined with GMMs For Speaker Recognition over Telephone Channels. 14-th International Conference On Digital Signal Processing (DSP 2002), 2002, July 1 3, Santorini, Greece, Volume II, 1081–1084.
[8] D. A. Reynolds. Experimental Evaluation of Features for Robust Speaker Identification. IEEE Transactions on Speech and Audio Processing, vol. 2, N°4, October 1994.
[9] K. P. Markov, S. Nakagawa. Integrating Pitch and LPC-Residual Information with LPC-Cepstral for Text-independent Speaker Recognition. J. Acoustic Society of Japan (E), 20, 4, 281–291, 1999.
[10] J. Pool, J. A. du Preez. HF Speaker Recognition. Thesis notes, Digital Signal Processing Group, Department of Electrical and Electronic Engineering, University of Stellenbosch, March 1999.
[11] M. D. Plumper, T. F. Quatieri, D. A. Reynolds. Modeling of the Glottal Flow Derivative Waveform with Application to Speaker Identification. IEEE Transactions on Speech and Audio Processing, vol. 7, N°5, September 1999.
[12] K. Markov, S. Nakagawa. Frame Level Likehood Normalization For Text-Independent Speaker Identification Using Gaussian Mixture Models. The Fourth International Conference on Spoken Language Processing, ICSLP96, vol. 3, October 3–6, Wyndham Franklin Plaza Hotel, Philadelphia, PA, USA.
[13] J. de Vetch, L. Boves. Comparison of Channel Normalization Techniques For Automatic Speech Recognition Over the Telephone. Department of Language and Speech, University of Nijmegen, P.O. Box 9103, 6500 HD Nijmen, The Netherlands.
[14] F. Liu, Richard M. Stern, Xuedong Huang, Alejandro Acero. Efficient Cepstral Normalization For Robust Speech recognition. Department of Electrical and Computer Engineering, School of Computer Science, Carnegie Mellon University. Pittsburgh, PA 15213.
[15] L. R. Rabiner, M. Cheng, A. Rosemberg, C. McGoegal. A Comparative Performance Study of Several Pitch Detection Algorithms. IEEE Trans. on Acoustics, Speech, and Signal Processing, vol. ASSP-24, N°5, 399–418, October 1976.
[16] B. Rabiner, B. Gold. Digital Processing of Speech Signals. Prentice Hall, Englewood Cliffd, NJ, 1975.
[17] D. Hardt and K. Fellbaum. Spectral Subtraction and Rasta Filtering in Text Dependent HMM-based Speaker Verification. Proc. of ICASSP, vol. 2, 867-870, April 1997.
[18] E. Simancas, M. Nakano Miyatake, H. Perez-Meana. Speaker Verification Using Pitch and Melspec Information. Journal of Telecommunications and Radio Engineering, vol. 56, 46–57, Jan. 2000.
[19] F. Hou, B. Wong. Text Independent Speaker Recognition Using Probabilistic SVM with GMM Adjustment. Proc. of the International Conference of Speech, Acoustics and Signal Processing, 305–308, 2003.
[20] D. A. Reynolds. An Overview of Automatic Speaker Recognition Technology. Proc. of the International Conference of Speech, Acoustics and Signal Processing, vol. 4, 4072–4075, 2002.
[21] E. Simancas Acevedo, H. Perez-Meana, M. Nakano Miyatake, A. Kurematsu. Effect of Voiced Segments in Gaussian Mixture Model Text Independent Speaker Verification. Journal of Electromagnetics Waves and Electronic Systems, vol. 8, N°7, 34–42, August, 2003.
[22] R. Zheng, S. Zhang, B. S. Xu. Text Independent Speaker Identification Using GMM-UBM and Frame Level Likelihood Normalization. International Symposium on Chinese Spoken Language Processing, 289–292, Dec. 2004.
[23] M. Kepesi, J. Macku. Introducing the Single-Channel Speech Separation Problem. Department of Telecommunications, Brno University of Technology, Purkynova 118, 612 00 Brno.
[24] M. Plsek, M. Vondra. Pitch Detection in Noisy Speech Recordings. Brno University of Technology, Faculty of Electrical Engineering and Communications, Department of Telecommunications, Purkynova 118, 61200 Brno, Czech Republic.


 

Akira Kurematsu received the B.E. degree in electrical communication engineering from Waseda University, Tokyo, Japan in 1961. In 1961, he joined the Research and Development Laboratories of KDD. He was engaged in research of pattern recognition, speech signal processing, communications terminal systems. In 1971, he received Ph.D. degree from Waseda University. In 1983, he was appointed the deputy director of KDD R&D labs. From 1986 to 1993, he was the President of ATR Interpreting Telephony research Laboratories. Since 1993, he is a professor of the department of electronic engineering of the University of Electro-Communications. He has authored many invited papers, journal and conference papers and five books in his related fields. He was nominated to the fellow of the Institute of Electronics, Information and Communication Engineers. He was a chairman of Tokyo chapter of IEEE Signal Processing Society (1997-1998). He is a senior member of IEEE. He is a member of the Institute of Electronics, Information and Communication Engineers, the Information Processing Society of Japan, the Acoustical Society of Japan, Japanese Society for Artificial Intelligence, the Association of Natural Language Processing. His research topics are: robust speech recognition, extraction of meaning and intention of dialogue, signal processing for robust speech recognition, multi-modal pattern recognition and character recognition in scene image, etc.

 
 

Mariko Nakano-Miyatake received the M.E. degree in Electrical Engineering from the University of Electro-Communications, Tokyo Japan in 1985, and her Ph. D in Electrical Engineering from The Universidad Autonoma Metropolitana (UAM), Mexico City, in 1998. From July 1992 to February 1997 she was a Department of Electrical Engineering of the UAM Mexico. In February 1997, she joined the Graduate Department of The Mechanical and Electrical Engineering School of The National Polytechnic Institute of Mexico, where she is now a Professor. Her research interests are in adaptive systems, neural networks, pattern recognition and related field. Dr. Nakano is a member of the IEEE, RISP and the National Researchers System of Mexico.

 
 

Hector Perez-Meana received the M.S. degree form the University of Electro-Communications, Tokyo Japan, a Ph. D. degree in Electrical Engineering from Tokyo Institute of Technology, Tokyo, Japan, in 1989. In 1981 he joined the Electrical Engineering Department of the Metropolitan University where he was a Professor. From March 1989 to September 1991, he was a visiting researcher at Fujitsu Laboratories Ltd, Kawasaki, Japan. In February 1997, he joined the Graduate Department of The Mechanical and Electrical Engineering School on the National Polytechnic Institute of Mexico, where he is now a Professor. In 1991 he received the IEICE excellent Paper Award, and in 1999 and 2000 the IPN Research Award. In 1998 he was Co-Chair of the ISITA’98. His principal research interests are adaptive filter systems, image processing, pattern recognition and related fields. Dr. Perez-Meana is a member of the IEEE, IEICE, the National Researchers System of Mexico and the Mexican Academy of Science.

e-mail: hmpm(at)prodigy.net.mx

 
 

Eric Simancas-Acevedo – PhD student in the Postgraduate and Investigation Study Section (SEPI) Campus Culhuacan of the National Polytechnic Institute (IPN) of Mexico. Interest areas: pattern recognition, neural networks, verification and identification of speech and speaker systems, signal processing.