Inside the Black Box: MIT Researchers Develop Method to Expose and ‘Steer’ Hidden Personas in Large Language Models
MIT researchers can now root out hidden biases and "steer" personalities in LLMs like ChatGPT and Claude. Discover the new "Recursive Feature Machine" method.
By: AXL Media
Published: Feb 26, 2026, 8:42 AM EST
Source: The information in this article was sourced from MIT News

The Abstract Depths of AI
Modern AI assistants like ChatGPT and Claude are more than just text generators; they have become repositories of human knowledge, capable of mimicking complex human traits. However, exactly how these models represent abstract concepts like "mood" or "personality" has remained largely a mystery. On February 19, 2026, MIT researchers announced a new approach that treats these models not as simple input-output machines, but as complex structures with "hidden" layers that can be mathematically probed. This discovery reveals that LLMs store a vast array of concepts that aren't always active but can be triggered or suppressed with precision.
Baiting the Right Species of Data
Traditionally, scientists have used "unsupervised learning" to find patterns in AI models—a process lead researcher Adit Radhakrishnan compares to throwing a massive net into the ocean and sifting through everything caught. The new MIT method is more like using targeted bait. By utilizing a Recursive Feature Machine (RFM), the team can identify the specific mathematical patterns (vectors) within the model that correspond to a concept of interest. This allows for a much faster and less computationally expensive way to find vulnerabilities or specific traits compared to broad trawling methods.
The Power to 'Steer' Responses
The researchers tested their method on 512 distinct concepts across five classes:
Categories
Topics
Related Coverage
- Virginia Tech research finds AI models reinforce autism stereotypes when providing social advice
- Brown University research identifies fifteen distinct ethical risks in the use of AI chatbots for mental health counseling
- MIT Researchers Develop ‘TLT’ Method to Double LLM Training Speed Using Idle Computing Time
- Virginia Tech researchers find AI models discourage social interaction for autistic users based on ingrained stereotypes