News Score: Score the News, Sort the News, Rewrite the Headlines

Lightweight Safety Classification Using Pruned Language Models

View PDF HTML (experimental) Abstract:In this paper, we introduce a novel technique for content safety and prompt injection classification for Large Language Models. Our technique, Layer Enhanced Classification (LEC), trains a Penalized Logistic Regression (PLR) classifier on the hidden state of an LLM's optimal intermediate transformer layer. By combining the computational efficiency of a streamlined PLR classifier with the sophisticated language understanding of an LLM, our approach delivers s...

Read more at arxiv.org

© News Score  score the news, sort the news, rewrite the headlines