Availability, quality and unobvious peculiarities of SSH metadata: A case study of Ukrainian economic publications

Increasing visibility in the Crossref database enables large bibliographic datasets to be collected for quantitative analysis. This possibility is very important for non-English speaking countries, whose scientific products are poorly represented in citation databases. The recently presented OUCI interface makes this task easier for Ukrainian Crossref data by providing special search filters. 

The open metadata for non-English countries

Ukraine is considered as one of the countries with low scientific output. This usually means that a comparably small number of publications related to Ukrainian authors are indexed in databases such as Web of Science or Scopus. And not all the disciplines are equally represented within this “visible portion” of output. This can be explained not only by the unequal disciplinary coverage of databases, but also by the peculiarities of the Ukrainian research itself. First, during the Soviet totalitarian era, communist censorship hindered the development of the social sciences and humanities in Ukraine, while natural sciences were considered ideologically safe. Secondly, natural disciplines are traditionally better represented in the global information space. Cyrillic writing and non-English speaking add more peculiarities to Ukrainian research publication data.

The potential way to solve the problem of lack of bibliographic data for scientometric analysis of non-English scientific publications, especially in the field of SSH, is the use of open Crossref data. Starting from 2018, the assignment of DOI to each publication is one of the requirements for Ukrainian journals to be included into the national List. This means that relevant structured metadata are delivered to Crossref systematically. Moreover, many leading Ukrainian publishers actively support the Initiative for Open Citations (I4OC), and more than 500 million references are now openly available through the Crossref API. 

Although Crossref avoids creating any metrics or special analytics, other toolmakers actively use this open citation metadata in their services (e.g., Dimensions, Lens.org, or COCI). The developers of the Open Ukrainian Citation Index (OUCI), a search engine and a citation database that uses open Crossref data, also followed this path. The OUCI interface includes search filters special for Ukrainian publication data.

Our study

These new opportunities to use open Crossref data have allowed us to conduct a large-scale quantitative analysis of Ukrainian economic research. OUCI import tools and Crossref API were used to extract open publication metadata. Nearly 24,000 Crossref records for 123 Ukrainian journals within the period 2002-2020 were collected for our study. 87% of these publications are not indexed in Scopus or Web of Science. Data processing and analysis were performed using custom Python codes. 

A typical Ukrainian economic article is found to be written by one (50%) or two authors (30%). Authors from other countries are published very rarely in Ukrainian journals, especially in those which are not indexed in Scopus or Web of Science. While complete citation statistics is not available,  our results confirm that publications written by many co-authors and, in particular, resulting from international collaboration have higher citation impact.

The description of our results in more detail is provided in the paper (the preprint version is available at ArXiv).

Prospects for future research

Our results show that open scholarly metadata (e.g., from the Crossref database) can be efficiently used to quantitatively describe the discipline that is less represented in databases such as Web of Science or Scopus. Moreover, this source of information currently seems crucial for a national analysis if an appropriate local information system is not developed so far. 

But even openly accessible research metadata is not “the end of the story”: their representation in a machine-readable format and the preferable use of unique digital descriptors are required. This is especially important in the case of metadata for publications by non-native English authors: numerous transliteration rules and parallel execution of Latin and Cyrillic alphabets make the name disambiguation procedure more problematic. But after all, the results of quantitative analysis can be used to reveal universal and particular characteristics of disciplinary research.