Bad teacher bots can leave hidden marks on model students • The Register
New research warns about the dangers of teaching LLMs on the output of other models, showing that undesirable traits can be transmitted “subliminally” from teacher to student, even when they are scrubbed from training data. The peer-reviewed study from researchers at Anthropic demonstrated that LLMs can transfer negative traits to “student” models, even when evidence…