GRAMOTA Publishers suggests publishing your scientific articles in periodicals
Pan-ArtPedagogy. Theory & PracticePhilology. Theory & PracticeManuscript

Archive of Scientific Articles

SOURCE:    Philology. Theory & Practice. Tambov: Gramota, 2022. № 10. P. 3382-3386.
SCIENTIFIC AREA:    Philological Sciences
Procedure of Scientific Articles Publication | To Show Issue Content | To Show All Articles in Section | Subject Index

License Agreement on scientific materials use.

https://doi.org/10.30853/phil20220563

Experimental Database Modelling of a Balanced Linguistic Corpus

Gorozhanov Alexey Ivanovich
Moscow State Linguistic University


Submitted: 04.09.2022
Abstract. The research aims to build a functioning experimental model of a relational database for operating with a balanced linguistic corpus of a fiction work. Scientific novelty lies in the fact that for the first time within the framework of a humanities study, a database of a linguistic corpus is being modeled with a thorough description and taking into account technical details and based on the provisions of the author’s concept of professionally oriented programming. The work involved three stages: forming a technical task (the structure of two tables of a relational database was developed, the SQLite format was selected, additional columns of the tables were provided for the subsequent expansion of the content of research), writing the source code for creating and filling the database (the Python programming language and the spaCy natural language processing module were used) and testing it based on the material of the texts of three F. Kafka’s novels “The Castle”, “Amerika” and “The Trial” (three functioning databases were created). The research findings have shown that modern natural language processing software tools allow one to create automatically full-fledged databases for processing SQL queries, which can be further expanded manually or automatically.
Key words and phrases: реляционная база данных, корпусная лингвистика, профессионально ориентированное программирование, SQLite, spaCy, relational database, corpus linguistics, professionally oriented programming
Open the whole article in PDF format. Free PDF-files viewer can be downloaded here.
References:
  1. Gorozhanov A. I., Guseinova I. A. Prikladnye aspekty analiza i interpretatsii tekstov (na materiale nemetskogo i russkogo yazykov). Kazan': Buk, 2021.
  2. Lesnikov S. V. Formirovanie gipertekstovogo korpusa uchebnykh slovarei russkogo yazyka // Filologicheskie nauki. Nauchnye doklady vysshei shkoly. 2021. № 4. DOI: 10.20339/PhS.4-21.027
  3. Pisarik O. I. Printsipy razrabotki bazy dannykh pod"yazyka predmetnoi oblasti «Stroitel'stvo» // Vestnik Moskovskogo gosudarstvennogo lingvisticheskogo universiteta. Gumanitarnye nauki. 2021. № 5 (847). DOI: 10.52070/2542-2197_2021_5_847_150
  4. Khokhlova M. V. Atributivnye kollokatsii v zolotom standarte sochetaemosti russkogo yazyka i ikh predstavlenie v slovaryakh i korpusakh tekstov // Voprosy leksikografii. 2021. № 21. DOI: 10.17223/22274200/21/2
  5. Ayre K., Bittar A., Kam J., Verma S., Howard L. M., Dutta R. Developing a Natural Language Processing Tool to Identify Perinatal Self-Harm in Electronic Healthcare Records // PLoS ONE. 2021. No. 16 (8). DOI: 10.1371/journal.pone.0253809
  6. Gorozhanov A. I., Guseynova I. A. Programming for Specific Purposes in Linguistics: A New Challenge for the Humanitarian Curricula // Training, Language and Culture. 2020. Vol. 4. No. 4. DOI: 10.22363/2521-442X-2020-4-4-23-38
  7. Jugran S., Kumar A., Tyagi B. S., Anand V. Extractive Automatic Text Summarization Using SpaCy in Python NLP // 2021 International Conference on Advance Computing and Innovative Technologies in Engineering, ICACITE 2021. Greater Noida, 2021. DOI: 10.1109/ICACITE51222.2021.9404712
  8. Mizrahi M., Dickinson M. A. Philosophical Reasoning about Science: A Quantitative, Digital Study // Synthese. 2022. Vol. 200. No. 2. DOI: 10.1007/s11229-022-03670-6
  9. Okhapkin V. P., Okhapkina E. P., Iskhakova A. O., Iskhakov A. Y. Constructing of Semantically Dependent Patterns Based on SpaCy and StanfordNLP Libraries // Communications in Computer and Information Science (in Books). 2021. Vol. 1395. DOI: 10.1007/978-981-16-1480-4_45
  10. Verma A., Sikarvar V., Yadav H., Jaganathan R., Kumar P. Shabd: A Psycholinguistic Database for Hindi // Behavior Research Methods. 2022. Vol. 54. No. 2. DOI: 10.3758/s13428-021-01625-2

Procedure of Scientific Articles Publication | To Show Issue Content | To Show All Articles in Section | Subject Index

© 2006-2024 GRAMOTA Publishers

site development and search engine optimization (seo): krav.ru