Explore projects
-
Updated
-
Updated
-
-
-
-
Updated
-
Updated
-
-
Updated
-
Entwickelt von Iveta Rangelova 3507005 und Brian Winata 3294960
Updated -
Updated
-
-
-
Updated
-
Code and data for the bachelor thesis of Johanna Schindler, matriculation number 3236531, entitled "Poor versus rich: investigating biases in transformer models of the German language".
In this paper, pre-trained transformer models of the German language are investigated to what extent they have learned stereotypes from training text corpora and reproduce them in language modeling. The German language model GBERT and the multilingual language model Luminous from Aleph Alpha are examined using the CrowS pairs method of Nangia et al. The challenge dataset is first translated into German and then a new metric for causal language models like Luminous is developed. The result of the study is that a bias in favor of stereotypes was measured for GBERT and Luminous in German, which is significantly higher than the ideal value of 50%. GBERT-large performed worse with 63% than the other models with 57%. On the English dataset, however, a much larger bias was measured with Luminous, at 68%. The result should be further investigated in the future using other methods or an improved dataset due to criticism on the CrowS-Pairs dataset.
Updated -
Updated