Xiaomeng Hu

150 

Xiaomeng Hu, Ph.D. student
Department of Computer Science and Engineering
The Chinese University of Hong Kong
Supervisor: Professor Tsung-Yi Ho

BIOGRAPHY

I am currently a third-year Ph.D. candidate at The Chinese University of Hong Kong, supervised by Prof. Tsung-Yi Ho.

My research focuses on the deep integration of Large Language Models (LLMs) and Reinforcement Learning (RL) to tackle core challenges in LLM alignment. Building upon my previous work in adversarial training and reward modeling, my current research is centered on constructing the next generation of complex, reliable, and scalable reward systems for LLM reinforcement learning.

Specifically, my research involves two core directions: (1) Exploring the construction of an "Agentic Reward System", where the evaluation process itself acts as an intelligent agent capable of assessing complex tasks through autonomous planning and tool use; and (2) Significantly enhancing the reliability and robustness of judge models used in the reward system through an adversarial self-play framework。

Coverage

CONTACT

RESEARCH FOCUS (RECENT)

EDUCATION

PUBLICATION

* denotes equal contribution.

Technical Report

Preprints

Conference papers

EXPERIENCE

SERVICE

TEACHING ASSISTANT

REVIEWER