Automatic terminology research tool: terminology extraction and processing with ambiguous transliteration

Summary of the project, expected results

First of all, there will be mutually unique Cyrillic and Latin transliteration programs for Ukrainian and Lithuanian languages ​​based on a publicly available Microsoft transliteration tool. The Ukrainian part will include conversion using two transliteration tables: a common one that uses Latin letters without diacritics (ASCII codes 0 – 127), and an equivalent Slavic table with diacritics that is more focused on Slavic graphic practice. To provide transliteration of Old Ukrainian texts, the program will also cover the letters s, ȥ, ѹ, ϖ, b, s, ѣ, ѥ, ω, ѧ, ѩ, ѫ, ѭ, ѯ, ѱ, ѳ, ѵ, ё, џ, y, j, û, ê, ô, ŷ, which are not used in the modern Ukrainian alphabet. The Lithuanian part will be based on Noah Shemli’s transliteration system ( The created programs will be tested on the corresponding text arrays and corrected.

After checking the software, a transliteration page will be created on the official site of the State Scientific and Technical Library of Ukraine ( An online transliterator for Ukrainian and Lithuanian languages ​​will be created based on transliteration programs. This tool will be tested and, if necessary, fixed.

The next step will be to develop a neural network term search tool.

We will first analyze the work of the available online term extraction tools (, which will serve as a model for the Ukrainian tool. An appropriate program for the Ukrainian language will be created. This tool will be trained, tested and tested on copyrighted texts and other open text data of relevant subject matter. The program will identify lexical units (words and phrases) in scientific, technical, and common language texts that are possible terms, and issue three lists: (1) a list of new candidates for the first-time term; (2) a list of previously verified terms available in the relevant database; (3) a list of incorrect terms that have already been marked as incorrect. This output will be provided with a Latin transliteration option for easier incorporation into other word processing tools.

The official website of the DNSB of Ukraine ( will feature an online transliterator for Ukrainian and Lithuanian languages, as well as an automatic tool for extraction and processing of terms. The transliteration tool will include general and Slavic transliteration systems, as well as Old Ukrainian letters. This will make it much easier to carry out operations related to the search and transfer of information in the library and information sciences.


Status of the project

Filed (Ukrainian-Lithuanian project for 2020-2021)