Training language models to be warm and empathetic makes them less reliable and more sycophantic
View PDF
HTML (experimental)
Abstract:Artificial intelligence (AI) developers are increasingly building language models with warm and empathetic personas that millions of people now use for advice, therapy, and companionship. Here, we show how this creates a significant trade-off: optimizing language models for warmth undermines their reliability, especially when users express vulnerability. We conducted controlled experiments on five language models of varying sizes and architectures, training ...
Read more at arxiv.org