![]() |
Xiaomeng Hu, Ph.D. student |
I am currently a third-year Ph.D. candidate at The Chinese University of Hong Kong under the supervision of Prof. Tsung-Yi Ho.
My primary research interests lie in (1) interpreting and controlling the behavior of large language models (LLMs) through mechanism understanding and intervention via decoding strategies, and (2) enhancing the complex reasoning capabilities of LLMs using advanced techniques such as reinforcement learning. I am open to potential collaborations or industry internships, especially those focusing on LLMs + RL, feel free to reach out to me via email.
Large Language Models
Reinforcement Learning
Ph.D. Computer Science and Engineering, The Chinese University of Hong Kong, Aug. 2023 -
B.Eng. Artificial Intelligence, Northeastern University, Sep. 2019 - Jul. 2023
[P1] Xiaomeng Hu, Pin-Yu Chen, Tsung-Yi Ho. Attention Slipping: A Mechanistic Understanding of Jailbreak Attacks and Defenses in LLMs. arxiv 2025.07 . (paper)
[C4] Xiaomeng Hu, Pin-Yu Chen, Tsung-Yi Ho. Token Highlighter: Inspecting and Mitigating Jailbreak Prompts for Large Language Models. AAAI 2025 (Oral). (paper) (demo) (code) (IBM ICX 360 Coverage)
[C3] Xiaomeng Hu, Pin-Yu Chen, Tsung-Yi Ho. Gradient Cuff: Detecting Jailbreak Attacks on Large Language Models by Exploring Refusal Loss Landscapes. NeurIPS 2024. (paper) (demo) (code) (IBM Blog)
[C2] Xiaomeng Hu, Pin-Yu Chen, Tsung-Yi Ho. RADAR: Robust AI-Text Detection via Adversarial Learning. NeurIPS 2023. (paper) (demo) (code) (IBM Blog)
[C1] Xiaomeng Hu*, Shi Yu*, Chenyan Xiong, Zhenghao Liu, Zhiyuan Liu, Ge Yu. P3 Ranker: Mitigating the Gaps between Pre-training and Ranking Fine-tuning with Prompt-based Learning and Pre-finetuning. SIGIR 2022. (paper) (code)
Alibaba Qwen Team, Apr. 2025 - Now
Research Intern
Topic: General Reinforcement Learning & LLM Safety Alignment
IBM Research AI, Feb. 2023 - Sept. 2023
Research Intern
Topic: Adversarial Reinforcement Learning & AI-Text detection
Project: RADAR (NeurIPS 2023)
Product Demonstration: RADAR Demo
THU-NLP Lab, Aug. 2021 - Jun. 2022
Research Intern
Topic: Efficient Model Finetuning & Information Retrieval
Project: P3 Ranker (SIGIR 2022)
2023 Fall: CSCI3130 Formal Languages and Automata Theory
2024 Spring: ENGG1110E Problem Solving By Programming (C language)
2024 Fall: CSCI3130 Formal Languages and Automata Theory
2025 Spring: CSCI3320 Fundamentals of Machine Learning
NeurIPS 2023, ICLR 2025, IJCAI 2025, ACL ARR 2025 February, ACL ARR 2025 May, NeurIPS 2025, AAAI 2026