Multi terabyte parameter deep neural networks are about as transparent as human brains. Going through the scenario set up in [1], I think I get that sophisticated super human agents will probably sycophantically smoke screen interpretability results to give the appearance of human alignment while really subverting alignment and instead psychopathically steering its ship towards its own aims whatever they may be.
Yes we may be the ants clueless to the motives of super human agents, but what if we end up learning the reason why human brains are so space and energy efficient is simplicity itself. We already know that deep networks can often be distilled, and maybe the biggest breakthrough in machine learning research ends up breaking down the super human intelligence into indeed highly interpretable --and therefore beautiful-- networks that can be reasoned about.
For a long time I believed that TL;DR is a mistake and that irreducible complexity is inevitable, but maybe there is no such thing as "super human", there is only classic "security by obscurity" which can be made clear through good abstractions and separation of concerns [2].
References
1. https://ai-2027.com
2. https://michal.piekarczyk.xyz/post/2025-05-09-simple-easy/
Yes we may be the ants clueless to the motives of super human agents, but what if we end up learning the reason why human brains are so space and energy efficient is simplicity itself. We already know that deep networks can often be distilled, and maybe the biggest breakthrough in machine learning research ends up breaking down the super human intelligence into indeed highly interpretable --and therefore beautiful-- networks that can be reasoned about.
For a long time I believed that TL;DR is a mistake and that irreducible complexity is inevitable, but maybe there is no such thing as "super human", there is only classic "security by obscurity" which can be made clear through good abstractions and separation of concerns [2].
References
1. https://ai-2027.com
2. https://michal.piekarczyk.xyz/post/2025-05-09-simple-easy/