|
Xiaomeng Hu, Ph.D. student |
I am currently a third-year Ph.D. candidate at The Chinese University of Hong Kong, supervised by Prof. Tsung-Yi Ho.
My research focuses on the deep integration of Large Language Models (LLMs) and Reinforcement Learning (RL) to tackle core challenges in LLM alignment. Building upon my previous work in adversarial training and reward modeling, my current research is centered on constructing the next generation of complex, reliable, and scalable reward systems for LLM reinforcement learning.
Specifically, my research involves two core directions: (1) Exploring the construction of an "Agentic Reward System", where the evaluation process itself acts as an intelligent agent capable of assessing complex tasks through autonomous planning and tool use; and (2) Significantly enhancing the reliability and robustness of judge models used in the reward system through an adversarial self-play framework。
Large Language Models
Reinforcement Learning
Agentic Reward Systems
Ph.D. Computer Science and Engineering, The Chinese University of Hong Kong, Aug. 2023 -
B.Eng. Artificial Intelligence, Northeastern University, Sep. 2019 - Jul. 2023
[T1] Qwen Team. Qwen3Guard Technical Report. (paper) (Core Contributor)
[P1] Xiaomeng Hu, Pin-Yu Chen, Tsung-Yi Ho. Attention Slipping: A Mechanistic Understanding of Jailbreak Attacks and Defenses in LLMs. arxiv 2025.07 . (paper)
[P2] Jingqi Tong, Yurong Mou, Hangcheng Li, Mingzhe Li, Yongzhuo Yang, Ming Zhang, Qiguang Chen, Tianyi Liang, Xiaomeng Hu, Yining Zheng, Xinchi Chen, Jun Zhao, Xuanjing Huang, Xipeng Qiu. Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm. arxiv 2025.11 . (paper) (code)
[C5] Xiaomeng Hu, Fei Huang, Chenhan Yuan, Junyang Lin, Tsung-Yi Ho. CARE: Decoding Time Safety Alignment via Rollback and Introspection Intervention. NeurIPS 2025. (paper)
[C4] Xiaomeng Hu, Pin-Yu Chen, Tsung-Yi Ho. Token Highlighter: Inspecting and Mitigating Jailbreak Prompts for Large Language Models. AAAI 2025 (Oral). (paper) (demo) (code) (IBM ICX 360 Coverage)
[C3] Xiaomeng Hu, Pin-Yu Chen, Tsung-Yi Ho. Gradient Cuff: Detecting Jailbreak Attacks on Large Language Models by Exploring Refusal Loss Landscapes. NeurIPS 2024. (paper) (demo) (code) (IBM Blog)
[C2] Xiaomeng Hu, Pin-Yu Chen, Tsung-Yi Ho. RADAR: Robust AI-Text Detection via Adversarial Learning. NeurIPS 2023. (paper) (demo) (code) (IBM Blog)
[C1] Xiaomeng Hu*, Shi Yu*, Chenyan Xiong, Zhenghao Liu, Zhiyuan Liu, Ge Yu. P3 Ranker: Mitigating the Gaps between Pre-training and Ranking Fine-tuning with Prompt-based Learning and Pre-finetuning. SIGIR 2022. (paper) (code)
Alibaba Qwen Team, Feb. 2025 - Now
Research Intern
THU-NLP Lab, Aug. 2021 - Jun. 2022
Research Intern
2023 Fall: CSCI3130 Formal Languages and Automata Theory
2024 Spring: ENGG1110E Problem Solving By Programming (C language)
2024 Fall: CSCI3130 Formal Languages and Automata Theory
2025 Spring: CSCI3320 Fundamentals of Machine Learning
NeurIPS 2023, ICLR 2025, IJCAI 2025, ACL ARR 2025 February, ACL ARR 2025 May, NeurIPS 2025, AAAI 2026