News Score: Score the News, Sort the News, Rewrite the Headlines

GitHub - p-e-w/heretic: Fully automatic censorship removal for language models

Heretic: Fully automatic censorship removal for language models Heretic is a tool that removes censorship (aka "safety alignment") from transformer-based language models without expensive post-training. It combines an advanced implementation of directional ablation, also known as "abliteration" (Arditi et al. 2024), with a TPE-based parameter optimizer powered by Optuna. This approach enables Heretic to work completely automatically. Heretic finds high-quality abliteration parameters by co-minim...

Read more at github.com

© News Score  score the news, sort the news, rewrite the headlines