GRAMOTA Publishers suggests publishing your scientific articles in periodicals
Pan-ArtPedagogy. Theory & PracticePhilology. Theory & PracticeManuscript

Archive of Scientific Articles

SOURCE:    Almanac of Modern Science and Education. Tambov: Gramota, 2016. № 12. P. 87-92.
SCIENTIFIC AREA:    Technical Sciences
Procedure of Scientific Articles Publication | To Show Issue Content | To Show All Articles in Section | Subject Index

License Agreement on scientific materials use.

SOFTWARE TOOLS FOR INFORMATION EXTRACTION FROM NATURAL-LANGUAGE TEXTS

Rubailo Andrei Valer'evich, Kosenko Maksim Yur'evich
Chelyabinsk State University


Abstract. The article describes the existing tools for extracting named entities from natural-language texts. A comparison of the considered tools to identify the most suitable of them to solve the task of extracting named entities from non-formatted Russian-language texts is carried out. The authors substantiate practical efficiency of Tomita-parser to solve tasks of extracting named entities from non-formatted Russian-language texts.
Key words and phrases: извлечение именованных сущностей, обработка текста, обработка информации, автоматизация, Томита-парсер, Named Entity Recognition, GATE, PullEnti SDK, Eureka Engine, extraction of named entities, word processing, data processing, automation, Tomita-parser, Named Entity Recognition, GATE, PullEnti SDK, Eureka Engine
Open the whole article in PDF format. Free PDF-files viewer can be downloaded here.
References:
  1. Tomita-parser. Rukovodstvo razrabotchika [Elektronnyi resurs]. URL: https://tech.yandex.ru/tomita/doc/dg/concept/ about-docpage/ (data obrashcheniya: 01.12.2016).
  2. Cunningham H., Maynard D., Tablan V. JAPE: a Java Annotation Patterns Engine. Second edition. Sheffield, 2000. 30 p.
  3. Eureka Engine [Elektronnyi resurs]. URL: http://eurekaengine.ru (data obrashcheniya: 01.12.2016).
  4. General Architecture for Text Engineering [Elektronnyi resurs]. URL: http://www.gate.ac.uk/ (data obrashcheniya: 05.12.2016).
  5. Hilbert M. The World’s Technological Capacity to Store, Communicate, and Compute Information // Science. 2011. Vol. 332. Iss. 6025. P. 60-65.
  6. PullEnti [Elektronnyi resurs]. URL: www.pullenti.ru (data obrashcheniya: 04.12.2016).
  7. Tomita M. LR Parsers for Natural Languages // COLING: 10th International Conference on Computational Linguistics: Proceedings of COLING 84. California, 1984. P. 354-357.
  8. White paper: Cisco VNI Forecast and Methodology, 2015-2020 [Elektronnyi resurs]. URL: http://www.cisco. com/c/en/us/solutions/collateral/service-provider/visual-networking-index-vni/complete-white-paper-c11-481360.html (data obrashcheniya: 30.11.2016).

Procedure of Scientific Articles Publication | To Show Issue Content | To Show All Articles in Section | Subject Index

© 2006-2024 GRAMOTA Publishers

site development and search engine optimization (seo): krav.ru