Automatic terminology research tool: terminology extraction and processing with ambiguous transliteration

Summary of the project, expected results

First of all, there will be mutually unique Cyrillic and Latin transliteration programs for Ukrainian and Lithuanian languages ​​based on a publicly available Microsoft transliteration tool. The Ukrainian part will include conversion using two transliteration tables: a common one that uses Latin letters without diacritics (ASCII codes 0 – 127), and an equivalent Slavic table with diacritics that is more focused on Slavic graphic practice. To provide transliteration of Old Ukrainian texts, the program will also cover the letters s, ȥ, ѹ, ϖ, b, s, ѣ, ѥ, ω, ѧ, ѩ, ѫ, ѭ, ѯ, ѱ, ѳ, ѵ, ё, џ, y, j, û, ê, ô, ŷ, which are not used in the modern Ukrainian alphabet. The Lithuanian part will be based on Noah Shemli’s transliteration system (https://www.omniglot.com/conscripts/lca.htm). The created programs will be tested on the corresponding text arrays and corrected.

After checking the software, a transliteration page will be created on the official site of the State Scientific and Technical Library of Ukraine (https://dntb.gov.ua). An online transliterator for Ukrainian and Lithuanian languages ​​will be created based on transliteration programs. This tool will be tested and, if necessary, fixed.

The next step will be to develop a neural network term search tool.

We will first analyze the work of the available online term extraction tools (https://termnet.eu/terminology-tools), which will serve as a model for the Ukrainian tool. An appropriate program for the Ukrainian language will be created. This tool will be trained, tested and tested on copyrighted texts and other open text data of relevant subject matter. The program will identify lexical units (words and phrases) in scientific, technical, and common language texts that are possible terms, and issue three lists: (1) a list of new candidates for the first-time term; (2) a list of previously verified terms available in the relevant database; (3) a list of incorrect terms that have already been marked as incorrect. This output will be provided with a Latin transliteration option for easier incorporation into other word processing tools.

The official website of the DNSB of Ukraine (https://dntb.gov.ua) will feature an online transliterator for Ukrainian and Lithuanian languages, as well as an automatic tool for extraction and processing of terms. The transliteration tool will include general and Slavic transliteration systems, as well as Old Ukrainian letters. This will make it much easier to carry out operations related to the search and transfer of information in the library and information sciences.

 

Literature

Vakulenko, Maksym O. 2019. Calculation of Semantic Distances between Words: From Synonymy to Antonymy. In: Journal of Quantitative Linguistics 26 (2): 116-128.

Vakulenko, Maksym. 2018. From Terminology-Vocabulary to Terminology-Science: A Ukrainian Trend [monograph]. LAP. 120 pp.

Lazarev, V.S., Nazarovets, S.A. 2018.Don not dismiss non-English citations. In: Nature 556 (7700): 174.

Vakulenko MO Ukrainian terminology: a complex linguistic analysis: [monograph]. Ivano-Frankivsk: Folio, 2015. 361 pp., Ill.

Vakulenko, Maksym O. 2015. Practical transcription and transliteration: Eastern Slavic view. In: Speech 32 (1): 35-56.

Janavičius, Arvydas Southwest; Žilinskas, Kęstutis. The general solution of Schrödinger equation for bound states // Canadian Journal of Physics. Ottawa: NRC Research Press. ISSN 0008-4204. 2013, Vol. 91, No. 3 5, p. 378-381.

Sakalauskas, Leonidas; Žilinskas, Kęstutis. Power Plant Investment Planning by Stochastic Programming // Technological and Economic Development of Economics. ISSN 1392-8619. 2010, Vol.16, No.4.

Giedrimas, Vaidas; Sakalauskas, Leonidas; Neimantas, Marius; Žilinskas, Kęstutis; Barauskas, Nerius; Valchiukas, Remigius. Wiki-based stochastic programming and statistical modeling for the cloud // International Journal of Advanced Computer Science & Applications. Bradford: The Science and Information (SAI). ISSN 2158-107X. eISSN 2156-5570. 2016, Vol. 7, iss. 3, p. 218-223.

Janavičius, Arvydas Southwest; Jurgaitis, Donatas; Žilinskas, Kęstutis. Semi-relativistic equation solutions for bound states of the heaviest nuclei // International Journal of Modern Engineering Research (IJMER). ISSN 2249-6645. 2018, Vol. 8, iss. 4, p. 1-9.

Žilinskas, Kęstutis. Two-stage stochastic linear programming by a series of Monte-Carlo estimators // Computational Science and Techniques. Klaipėda: Klaipėda University. eISSN 2029-9966. 2014, Vol. 2, no. 2, p. 289-312.

Status of the project: Completed.