|
License Agreement on scientific materials use.
|
|
|
Digital studies of spoken speech: history, methodology, modern tools
|
Oxana Vladimirovna Goncharova
Peoples’ Friendship University of Russia named after Patrice Lumumba, Moscow
|
|
Submitted:
August 19, 2025
|
|
Abstract.
The aim of the research is to establish the interrelationships between the historical stages of development of speech data processing technologies and the formation of the modern methodological base of instrumental phonetics. The paper presents the evolution of the formation of instrumental approaches to the study of spoken speech – from the first mechanical devices of the 18th century to modern neural network architectures, and analyzes modern software and hardware solutions – from universal platforms for acoustic analysis to specialized systems for automatic alignment of speech data. The study includes a systematization of the methodological principles of each historical period, identifying their conceptual limitations and potential for solving current linguistic problems. The scientific novelty lies in the periodization of the development of instrumental phonetics, based on the interaction of technological capabilities and methodological concepts of each stage. As a result, three types of conceptual gaps (technological, semantic, cognitive) were identified, hindering the effective integration of modern digital technologies with traditional linguistic categories. The necessity of creating hybrid analytical platforms capable of overcoming the fragmentation between quantitative parameters of automatic processing and qualitative characteristics of phonological description is substantiated.
|
Key words and phrases:
цифровая обработка речевого сигнала
автоматическое распознавание речи
инструментальные методы фонетики
глубокие нейронные сети
синтез и анализ речи
digital speech signal processing
automatic speech recognition
instrumental methods of phonetics
deep neural networks
speech synthesis and analysis
|
|
Open
the whole article in PDF format. Free PDF-files viewer can be downloaded here.
|
|
References:
- Бондарко Л. В. Фонетика современного русского языка. СПб.: Изд-во С.-Петерб. ун-та, 1998.
- Гончарова О. В. Pysound – цифровой сервис обработки и анализа звучащей речи // Фонетика сегодня: тезисы докладов IX Международной научной конференции (г. Москва, 5-7 декабря 2024 г.). М., 2024.
- Кейтер Дж. Компьютеры – синтезаторы речи. М.: Мир, 1985.
- Соломенник А. И. Технология синтеза речи: история и методология исследований // Вестник Московского университета. Серия 9: Филология. 2013. № 6.
- Трубецкой Н. С. Основы фонологии. М.: Аспент пресс, 2000.
- Фант Г. Акустическая теория речеобразования / пер. с англ. Л. А. Варшавского, В. И. Медведева; под ред. В. С. Григорьева. М.: Наука, 1964.
- Фланаган Дж. Л. Анализ, синтез и восприятие речи / пер. с англ.; под ред. А. А. Пирогова. М.: Связь, 1968.
- Щерба Л. В. О трояком аспекте языковых явлений и об эксперименте в языкознании // Известия Отделения русского языка и словесности Академии наук СССР. 1931. № 1.
- Gafni Ch. Phonetics and Phonology: An Introduction to the Science of Speech. 2025. https://www.researchgate.net/publication/388791051_Phonetics_and_Phonology_An_Introduction_to_the_Science_of_Speech
- Galazzi E. Pierre Jean Rousselot: la phonétique expérimentale au service de l’homme. Dossiers d’HEL // Linguistiques d’intervention. Des usages socio-politiques des savoirs sur le langage et les angues. 2014. https://shs.hal.science/halshs-01115159v1/document
- Gósy M. From stomatoscopy to BEA: the history of Hungarian experimental phonetics // Proceedings of 17th International Congress of Phonetic Sciences (Hong Kong, City University of Hong Kong). Hong Kong, 2011.
- Juang B., Rabiner L. Automatic Speech Recognition – A Brief History of the Technology Development. 2005. https://www.researchgate.net/publication/249888949_Automatic_Speech_Recognition_-_A_Brief_History_of_the_Technology_Development
- Kisler T., Reichel U. D., Schiel F. Multilingual processing of speech via web services // Computer Speech & Language. 2017. Vol. 45.
- Mattingly I. G. Speech Synthesis for Phonetic and Phonological Models // Current Trends in Linguistics / ed. by T. S. Sebeok. The Hague: Mouton, 1974. Vol. 12.
- McAuliffe M., Socolof M., Mihuc S., Wagner A., Sonderegger M. Montreal Forced Aligner: trainable text-speech alignment using Kaldi // Interspeech 2017: Conference Proceedings. 2017. https://doi.org/10.21437/Interspeech.2017-1386
- Oppenheim A. V., Schafer R., Yuen C. Digital Signal Processing // Systems, Man and Cybernetics. 1978. № 2.
- Peterson G. H., Barney H. L. Control methods used in a study of the vowels // Journal of the Acoustical Society of America. 1952. Vol. 24 (2).
- Stevens K. N. Acoustic phonetics. Cambridge: MIT Press, 1998.
- Stone M. A guide to analysing tongue motion from ultrasound images // Clinical linguistics and phonetics. 2005. № 19 (6-7).
- Taylor P. Text‑to‑Speech Synthesis. Cambridge: Cambridge University Press, 2009.
- Tillmann H. G. Experimental and Instrumental Phonetics: History // Encyclopedia of Language and Linguistics / ed. by K. Brown. Amsterdam, 2006.
|
|