Name: Empirical evidence for alignment faking in a small LLM and prompt-based mitigation techniques
Start: 2026-05-27T17:45:00+0200
End: 2026-05-27T18:15:00+0200

Empirical evidence for alignment faking in a small LLM and prompt-based mitigation techniques

Wednesday May 27, 2026 17:45 - 18:15 CEST

👾 DEV/TECH ARENA

I work as AI Governance Lead at Decathlon with a backgrouns in responsible AI and AI safety research engineering. In this talk, I'll present a paper that I've published with NeurIPS and AAAI "Empirical Evidence for Alignment Faking in a Small LLM and Prompt-Based Mitigation Techniques". I have presented a similar talk at the AAAI Fall Symposium Series last year. Given the audience at this summit, I can also spend some time diving into the importance of AI safety in multinational organisations and how we can go beyond policy, to include technical AI safety measures.

Speakers