"New Experiment Uses Large Language Models to Compress Text; Successfully Extracts and Reproduces Training Text from LLMs"

DRINK ME: (Ab)Using a LLM to compress text

Featured IntroductionLarge language models are trained on huge datasets of text to learn the relationships and contexts of words within larger documents. These relationships are what allows the model to generate text.Recently I've read concerns about LLMs being trained on copyrighted text and reproducing it. This got me thinking: Can training text be extracted from an LLM? The answer, of course, is yes, and this isn't a new (or open) question. This led me to wonder what it would take to extract ...