Harold Chen

Ph.D. Student
The Hong Kong University of Science and Technology
Email: haroldchen328 [at] gmail.com

Biography

I'm a PhD student at The Hong Kong University of Science and Technology (HKUST) and a student researcher at Knowin AI. I obtained my B.S. degree at Northwestern Polytechnical University (NPU) in 2025, where I worked closely with Prof. Dian Shao. I'm fortunate to work closely with Prof. Ying-Cong Chen, Prof. Harry Yang, Prof. Qifeng Chen and Prof. Ser-Nam Lim.

I am always open to all forms of research collaboration. Feel free to contact me if you are interested in working with me! My research interests include Video Generation & Understanding, Agentic System, and Embodied AI.

News

[07/2025] FineQuest is accepted to MM'25!

[05/2025] VistaDPO is accepted to ICML'25!

[02/2025] FinePhys is accepted to CVPR'25!

[02/2025] Served as a reviewer for ICCV'25.

[02/2025] Served as a reviewer for Pattern Recognition.

[12/2024] Served as a reviewer for ICML'25.

[12/2024] SeFAR is accepted to AAAI'25!

[11/2024] Served as a reviewer for CVPR'25.

[08/2024] Served as a reviewer for ICLR'25.

[07/2024] FineCLIPER and CREST are accepted to MM'24!

[05/2024] Served as a reviewer for NeurIPS'24.

[02/2024] Served as a reviewer for ECCV'24.

[01/2024] Served as a reviewer for MM'24.

[01/2024] UrbanCLIP is accepted to WWW'24!

Experience

Visiting Student | HKUST
Time: 11/2024 - 6/2025. Advisor: Prof. Harry Yang

Research Intern | Everlyn AI
Time: 8/2024 - 5/2025. Mentor: Prof. Ser-Nam Lim

Research Intern | HKGAI
Time: 5/2024 - 8/2024. Mentor: Prof. Wenhan Luo

Selected Preprints

Temporal Regularization Makes Your Video Generator Stronger
Harold Haodong Chen, Haojian Huang, Xianfeng Wu, Yexin Liu, Yajing Bai, Wen-Jie Shu, Harry Yang^†, Ser-Nam Lim^†
arXiv, 2025
[arXiv] [Webpage]

LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization
Xianfeng Wu*, Yajing Bai*, Haoze Zheng*, Harold Haodong Chen* (co-first), Yexin Liu*, Zihao Wang, Xuran Ma, Wen-Jie Shu, Harry Yang^†, Ser-Nam Lim^†
arXiv, 2025
[arXiv] [Code]

Beyond Generation: Unlocking Universal Editing via Self-Supervised Fine-Tuning
Harold Haodong Chen, Harry Yang^†, Ser-Nam Lim^†
arXiv, 2024
[arXiv] [Webpage] [Code] [Benchmark]

GaussianVTON: 3D Human Virtual Try-ON via Multi-Stage Gaussian Splatting Editing with Image Prompting
Haodong Chen, Yongle Huang, Haojian Huang, Xiangsheng Ge, Dian Shao^†
arXiv, 2024
[arXiv] [Webpage] [Code]

Publications

FineQuest: Adaptive Knowledge-Assisted Sports Video Understanding via Agent-of-Thoughts Reasoning
Haodong Chen, Haojian Huang, Xinxiang Yin, Dian Shao^†
ACM International Conference on Multimedia (MM), 2025
[Paper] [arXiv]

VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models
Haojian Huang*, Haodong Chen* (co-first), Shengqiong Wu, Meng Luo, Jinlan Fu, Xinya Du, Hanwang Zhang, Hao Fei^†
International Conference on Machine Learning (ICML), 2025
[Paper] [arXiv] [Code] [Dataset]

FinePhys: Fine-grained Human Action Generation by Explicitly Incorporating Physical Laws for Effective Skeletal Guidance
Dian Shao^†, Mingfei Shi, Shengda Xu, Haodong Chen, Yongle Huang, Binglu Wang
IEEE/CVF Computer Vision and Pattern Recognition (CVPR), 2025
[Paper] [arXiv] [Webpage] [Code]

SeFAR: Semi-supervised Fine-grained Action Recognition with Temporal Perturbation and Learning Stabilization
Yongle Huang*, Haodong Chen* (co-first), Zhenbang Xu, Zihan Jia, Haozhou Sun, Dian Shao^†
AAAI Conference on Artificial Intelligence (AAAI), 2025
[Paper] [arXiv] [Code] [Dataset]

FineCLIPER: Multi-modal Fine-grained CLIP for Dynamic Facial Expression Recognition with AdaptERs
Haodong Chen, Haojian Huang, Junhao Dong, Mingzhe Zheng, Dian Shao^†
ACM International Conference on Multimedia (MM), 2024
[Paper] [arXiv] [Webpage]

CREST: Cross-modal Resonance through Evidential Deep Learning for Enhanced Zero-Shot Learning
Haojian Huang, Xiaozhen Qiao, Zhuo Chen, Haodong Chen, Bingyu Li, Zhe Sun, Mulin Chen^†, Xuelong Li^†
ACM International Conference on Multimedia (MM), 2024
[Paper] [arXiv] [Code]

UrbanCLIP: Learning Text-enhanced Urban Region Profiling with Contrastive Language-Image Pretraining from the Web
Yibo Yan, Haomin Wen, Siru Zhong, Wei Chen, Haodong Chen, Qingsong Wen, Roger Zimmermann, Yuxuan Liang^†
ACM International World Wide Web Conference (WWW), 2024 (Oral Presentation)
[Paper] [arXiv] [Video] [Code]

Awards and Honors

Outstanding Graduate, NPU	2025
Innovation and Entrepreneurship Advanced Individual Honor, NPU	2024
Outstanding University Student, NPU	2023-2024
School Scholarship, NPU	2023-2024
University Student Innovation Fund, Ministry of Education of P.R. China	2023
Academic Advancement Individual Honor, NPU	2023

Services

Conference Reviewer,
Computer Vision and Pattern Recognition (CVPR), 2025
International Conference on Computer Vision (ICCV), 2025
European Conference on Computer Vision (ECCV), 2024
International Conference on Machine Learning (ICML), 2025
International Conference on Learning Representations (ICLR), 2025
Neural Information Processing Systems (NeurIPS), 2024-2025
AAAI Conference on Artificial Intelligence (AAAI), 2026
ACM International Conference on Multimedia (MM), 2024
Artificial Intelligence and Statistics (AISTATS), 2025

Journal Reviewer,
Pattern Recognition
ACM TOMM