Vortrag von @Argie_Kasprzik , Lakshmi Rajendram Bashyam (ZBW)
Wann: Dienstag, 21. April 2026, 09:30 - 10:00
Wo: BigBlueButton
Abstract
At ZBW, the AI-based service for automated subject indexing („AutoSE“) has been in productive operations since 2021 and is continuously improved and extended. We use machine learning methods that are developed or adapted in the context of our own in-house applied research. Which models or which combination of models is the best choice for the automation of a subject indexing use case strongly depends on the amount and the characteristics of the data available for training and as input in productive operations, and also on the structure and design of the controlled target vocabulary of descriptors.
We report on some experimental results regarding the interaction of our data sets with the various models we are using or considering to use, including embedding- and transformer-based models by looking at typical errors those models make when applied to our data. We have also evaluated the influence of the structure of the controlled vocabulary „Standard-Thesaurus Wirtschaft“ (STW) that is used for subject indexing at ZBW. We conclude with an outlook on how to make these components work together better in the future.