"Neural Networks Deciphered through 'Polytope Lens': A New Insights into AI Interpretability"

Interpreting Neural Networks through the Polytope Lens

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.Sid Black*, Lee Sharkey*, Leo Grinsztajn, Eric Winsor, Dan Braun, Jacob Merizian, Kip Parker, Carlos Ramón Guevara, Beren Millidge, Gabriel Alfour, Connor Leahy*equal contributionResearch from Conjecture.This post benefited from feedback from many staff at Conjecture including Adam Shimi, Nicholas Kees Dupuis, Dan Clothiaux, Kyle McDonell. Additionally, the post also benefited from inputs from Jessica Cooper, E...