About me
Hi!, I’m Wei Jie Yeo, currently a PhD student at Nanyang Technological University, Singapore, supervised by Prof. Erik Cambria. My main research interests lies between the intersection of NLP and Interpretability, with a focus on improving the current lack of understanding in how AI systems model various complex behaviors. Lately, I have been deeply interested in finding ways to utilize interpretability to improve problems in AI safety, such as jailbreak or prompt injection attacks.
Employment
My expected graduation is towards the end of 2025 and I will be actively seeking a full-time position in the industry. Feel free to connect if you think I am a suitable candidate! CV
Selected Publications
Understanding Refusal in Language Models with Sparse Autoencoders
Wei Jie Yeo, Nirmalendu Prakash, Clement Neo, Roy Ka-Wei Lee, Erik Cambria, Ranjan Satapathy
Preprint, 2025.
[Paper] [Code]
Debiasing CLIP: Interpreting and Correcting Bias in Attention Heads
Wei Jie Yeo, Rui Mao, Moloud Abdar, Erik Cambria, Ranjan Satapathy
Preprint, 2025.
[Paper] [Code]
A comprehensive review on financial explainable AI
Wei Jie Yeo, Wihan Van Der Heever, Rui Mao, Erik Cambria, Ranjan Satapathy, Gianmarco Mengaldo
AIRE Journal, 2025.
[Paper]
Self-training Large Language Models through Knowledge Detection
Wei Jie Yeo, Teddy Ferdinan, Przemyslaw Kazienko, Ranjan Satapathy, Erik Cambria
EMNLP, 2024.
[Paper] [Code]
How Interpretable are Reasoning Explanations from Prompting Large Language Models?
Wei Jie Yeo, Ranjan Satapathy, Goh Siow Mong, Erik Cambria
NAACL, 2024.
[Paper] [Code]
Plausible Extractive Rationalization through Semi-Supervised Entailment Signal
Wei Jie Yeo, Ranjan Satapathy, Erik Cambria
ACL, 2024
[Paper] [Code]
