GRAMOTA Publishers suggests publishing your scientific articles in periodicals
Pedagogy. Theory & PracticePhilology. Theory & PracticeManuscript

Archive of Scientific Articles

SOURCE:    Philology. Theory & Practice. Tambov: Gramota, 2023. № 5. P. 1616-1620.
SCIENTIFIC AREA:    Philological Sciences
Procedure of Scientific Articles Publication | To Show Issue Content | To Show All Articles in Section | Subject Index

License Agreement on scientific materials use.

https://doi.org/10.30853/phil20230252

Building a linguistic corpus based on natural language processing tools: Planning software solutions

Gorozhanov Alexey Ivanovich
Moscow State Linguistic University


Submitted: 17.03.2023
Abstract. The paper is aimed at building a model of a linguistic corpus, which is generated according to the rules of the spaCy natural language processing library. Scientific novelty lies in the fact that within the framework of humanities research, the method of modelling is used, which is combined with a corpus approach and takes into account the technological (software) component at the very stage of goal setting. In the research, firstly, a general structural model of a linguistic corpus as a sequence of blocks was determined and standard queries to the database were formulated; secondly, a model of the corpus manager interface able to implement these standard queries was built; thirdly, an analysis of the proposed model with the help of mini-programs that allow assessing the degree of technical feasibility of the queries and their practical value was conducted. At this stage, text arrays of fictional works by German-speaking (F. Kafka, E. M. Remarque) and English-speaking (A. C. Doyle, G. Orwell) writers were involved as linguistic material. The obtained results showed that the constructed model has a number of advantages with a limited number of disadvantages, is flexible in terms of further development and can be programmatically implemented in the short term.
Key words and phrases: моделирование, корпусная лингвистика, корпусный менеджер, графический интерфейс пользователя, spaCy, modelling, corpus linguistics, corpus manager, graphical user interface
Open the whole article in PDF format. Free PDF-files viewer can be downloaded here.
References:
  1. Bakaev M. A., Razumnikova O. M. Opredelenie slozhnosti zadach dlya zritel'no-prostranstvennoi pamyati i propusknoi sposobnosti cheloveka-operatora // Upravlenie bol'shimi sistemami: sbornik trudov. 2017. № 70.
  2. Boiko V. A., Legalov A. I., Zykov S. V. Arkhitektura intellektual'noi sistemy testirovaniya // Zhurnal Sibirskogo federal'nogo universiteta. Seriya «Tekhnika i tekhnologii». 2022. T. 15. № 2. DOI: 10.17516/1999-494X-0390
  3. Gorozhanov A. I. Eksperimental'noe modelirovanie bazy dannykh sbalansirovannogo lingvisticheskogo korpusa // Filologicheskie nauki. Voprosy teorii i praktiki. 2022. T. 15. Vyp. 10. DOI: 10.30853/phil20220563
  4. Gorozhanov A. I., Stepanova D. V. Sostavlenie sbalansirovannogo korpusa khudozhestvennogo proizvedeniya (na materiale romanov F. Kafki) // Vestnik Moskovskogo gosudarstvennogo lingvisticheskogo universiteta. Gumanitarnye nauki. 2022. № 7 (862). DOI: 10.52070/2542-2197_2022_7_862_31
  5. Pisarik O. I. Printsipy razrabotki bazy dannykh pod"yazyka predmetnoi oblasti «Stroitel'stvo» // Vestnik Moskovskogo gosudarstvennogo lingvisticheskogo universiteta. Gumanitarnye nauki. 2021. № 5 (847). DOI: 10.52070/2542-2197_2021_5_847_150
  6. Chitalov D. I. Dorabotka graficheskogo interfeisa platformy OpenFOAM v chasti rasshireniya perechnya utilit dlya raboty s raschetnymi setkami // Sistemy i sredstva informatiki. 2022. T. 32. № 1. DOI: 10.14357/08696527220113
  7. Fonseca C. A., Guelpeli M. V. C., De Souza Netto R. S. Representation of structured data of the text genre as a technique for automatic text processing // Texto Livre. 2021. Vol. 15. DOI: 10.35699/1983-3652.2022.35445
  8. Malyuga E. N., McCarthy M. “No” and “net” as response tokens in English and Russian business discourse: In search of a functional equivalence // Russian Journal of Linguistics. 2021. Vol. 25 (2). DOI: 10.22363/2687-0088-2021-25-2-391-416
  9. O’Neill H., Welsh A., Smith D. A., Roe G., Terras M. Text mining mill: Computationally detecting influence in the writings of John Stuart Mill from library records // Digital Scholarship in the Humanities. 2021. Vol. 36 (4). DOI: 10.1093/llc/fqab010
  10. Tsujii J. Natural language processing and computational linguistics // Computational Linguistics. 2021. Vol. 47 (4). DOI: 10.1162/COLI_a_00420

Procedure of Scientific Articles Publication | To Show Issue Content | To Show All Articles in Section | Subject Index

© 2006-2023 GRAMOTA Publishers

site development and search engine optimization (seo): krav.ru