Algorithms for prosodic discourse feature interpretation in case of its processing using low-speed codecs

Main Article Content

Maxim Bessonov
Natalia A. Bessonova
Mais P. Farkhadov

Abstract

In this article we propose two algorithms for discourse prosodic feature interpretation. The first algorithm based on wide phonetic categories and second algorithm based on audio signal melodic cross-correlation functions and short-timed energy series – as well as methodical recommendations for their use are proposed as a part of the problem of audio signal language identification based on a prosodic approach. An experimental evaluation of both algorithms is proposed. Neural networks are used as a decision rule. Wide phonetic categories were pause, pitch, noise. We have expanded wide phonetic categories to pause, pitch, noise, five levels of pitch, sites of decreasing energy, main maximum, adverse maximum. The total number of categories was 14. These algorithms can be applied for language identification or speaker identification.  At the same time there is no requirement to restore the speech signal after processing it by low-speed codec. Certainly, frames of the speech codec must contain such parameters as pitch, tone-noise parameter, energy. The base of speech signals consists of 10 languages 10 speakers per language. Total time of the speech per speaker is 100 minutes. This time takes into account statistical regularities of languages. Tests for evaluation of the algorithms were carried out with a multilayer perceptron.

Downloads

Download data is not yet available.

Article Details

How to Cite
Bessonov, M., Bessonova, N., & Farkhadov, M. (2018). Algorithms for prosodic discourse feature interpretation in case of its processing using low-speed codecs. Advances in Systems Science and Applications, 18(1), 1-11. https://doi.org/10.25728/assa.2018.18.1.524
Section
Translated articles