![]() |
Xiaomeng Hu, Ph.D. student |
I am currently a third-year Ph.D. candidate at The Chinese University of Hong Kong under the supervision of Prof. Tsung-Yi Ho.
My primary research interests lie in (1) interpreting and controlling the behavior of large language models (LLMs), and (2) enhancing the complex reasoning capabilities of LLMs using advanced techniques such as reinforcement learning. I am open to potential collaborations or industry internships, especially those focusing on LLMs + RL, please feel free to reach out to me via email.
Large Language Models
Reinforcement Learning
Ph.D. Computer Science and Engineering, The Chinese University of Hong Kong, Aug. 2023 -
B.Eng. Artificial Intelligence, Northeastern University, Sep. 2019 - Jul. 2023
[T1] Qwen Team. Qwen3Guard Technical Report. (paper)
[P1] Xiaomeng Hu, Pin-Yu Chen, Tsung-Yi Ho. Attention Slipping: A Mechanistic Understanding of Jailbreak Attacks and Defenses in LLMs. arxiv 2025.07 . (paper)
[C5] Xiaomeng Hu, Fei Huang, Chenhan Yuan, Junyang Lin, Tsung-Yi Ho. CARE: Decoding Time Safety Alignment via Rollback and Introspection Intervention. NeurIPS 2025. (paper)
[C4] Xiaomeng Hu, Pin-Yu Chen, Tsung-Yi Ho. Token Highlighter: Inspecting and Mitigating Jailbreak Prompts for Large Language Models. AAAI 2025 (Oral). (paper) (demo) (code) (IBM ICX 360 Coverage)
[C3] Xiaomeng Hu, Pin-Yu Chen, Tsung-Yi Ho. Gradient Cuff: Detecting Jailbreak Attacks on Large Language Models by Exploring Refusal Loss Landscapes. NeurIPS 2024. (paper) (demo) (code) (IBM Blog)
[C2] Xiaomeng Hu, Pin-Yu Chen, Tsung-Yi Ho. RADAR: Robust AI-Text Detection via Adversarial Learning. NeurIPS 2023. (paper) (demo) (code) (IBM Blog)
[C1] Xiaomeng Hu*, Shi Yu*, Chenyan Xiong, Zhenghao Liu, Zhiyuan Liu, Ge Yu. P3 Ranker: Mitigating the Gaps between Pre-training and Ranking Fine-tuning with Prompt-based Learning and Pre-finetuning. SIGIR 2022. (paper) (code)
Alibaba Qwen Team, Apr. 2025 - Now
Research Intern
Topic: Model-based Reinforcement Learning & LLM Alignment
THU-NLP Lab, Aug. 2021 - Jun. 2022
Research Intern
Topic: Efficient Model Finetuning & Information Retrieval
Project: P3 Ranker (SIGIR 2022)
2023 Fall: CSCI3130 Formal Languages and Automata Theory
2024 Spring: ENGG1110E Problem Solving By Programming (C language)
2024 Fall: CSCI3130 Formal Languages and Automata Theory
2025 Spring: CSCI3320 Fundamentals of Machine Learning
NeurIPS 2023, ICLR 2025, IJCAI 2025, ACL ARR 2025 February, ACL ARR 2025 May, NeurIPS 2025, AAAI 2026