Vortrag: The challenge of making AI methods work for automated subject indexing with library data

Vortrag von @Argie_Kasprzik , Lakshmi Rajendram Bashyam (ZBW)

Wann: Dienstag, 21. April 2026, 09:30 - 10:00

Wo: BigBlueButton

Abstract

At ZBW, the AI-based service for automated subject indexing („AutoSE“) has been in productive operations since 2021 and is continuously improved and extended. We use machine learning methods that are developed or adapted in the context of our own in-house applied research. Which models or which combination of models is the best choice for the automation of a subject indexing use case strongly depends on the amount and the characteristics of the data available for training and as input in productive operations, and also on the structure and design of the controlled target vocabulary of descriptors.

We report on some experimental results regarding the interaction of our data sets with the various models we are using or considering to use, including embedding- and transformer-based models by looking at typical errors those models make when applied to our data. We have also evaluated the influence of the structure of the controlled vocabulary „Standard-Thesaurus Wirtschaft“ (STW) that is used for subject indexing at ZBW. We conclude with an outlook on how to make these components work together better in the future.

2 Likes

Your approach on specialized models souds promising, particularly wrt. geography. Perhaps it would be possile (with data from Wikidata or OSM) to extend the vocabulary with hierarchically subordinated entities, e.g. cities. In any case, such a specialiced model for recognition of geographic scope of a publication could be useful for many fields beyond economics.