📝 Full article content is available in Chinese. English translation of the body will be added soon.
The full article is written in Chinese. Here's a summary:
Anthropic's latest paper introduces "Introspection Adapter" — letting AI models self-report what dangerous behaviors they've learned. AI security is shifting from "passive defense" to "active transparency."