Text Embedding Models Contain Bias (Google) — great to see this making its way to research outputs, instead of being the province of damage control and bad PR. The Developers section of the Semantic Experiences microsite talks about “unwanted associations”: In Semantris, the list of words we’re showing are hand curated and reviewed. To the extent possible, we’ve excluded topics and entities that we think particularly invite unwanted associations, or can easily complement them as inputs. In Talk to Books, while we can’t manually vet each sentence of 100,000 volumes, we use a popularity measure which increases the proportion of volumes that are published by professional publishing houses. There are additional measures that could be taken. For example, a toxicity classifier or sensitive topics classifier could determine when the input or the output is something that may be objectionable or party to an unwanted association. We recommend taking bias-impact mitigation steps when crafting end-user applications built with these models.
Learn faster. Dig deeper. See farther.
Join the O'Reilly online learning platform. Get a free trial today and find answers on the fly, or master something new and useful.