😊 About me

Resume: Resume (Updated on 6 Jan, 2026)

Contact: vanzl3386 [at] gmail.com (preferred), vanzl [at] u.nus.edu, 121090525 [at] link.cuhk.edu.cn

Hi! I am Zhenglin Wan (万政霖), a first-year CS Ph.D. student at HPC-AI Lab at National University of Singapore (NUS), advised by Prof. Yang You and supported by NUS Research Scholarship. Previously, I worked as a full-time researcher in Nanyang Technological University (NTU) with Prof. Bo An. I received my B.Sc from Chinese University of Hong Kong (CUHK) on 2025 Fall (completed undergrad in Shenzhen campus). I also have spent a long time working with Prof. Ivor Tsang at Centre for Frontier AI Research (CFAR), IHPC, A*STAR in Singapore, and have interned at Hong Kong Generative AI Research & Development Center (HKGAI), led by provost Prof. Yike Guo in HKUST.

I am a RL (Reinforcement Learning) believer. Previously I mainly studied improving RL or inverse RL from algorithmic side: such as improving the policy diversity (like EBC) and policy expressiveness (like GoRL). Recently, I am focusing on RL and LLM Agent. Specifically:

The synergy/relationship between RL post-training and world model in LLM Agent.
Efficiency in LLM Agent systems and RL post-training infrastructures.

I am grateful to have received help and guidance for my career from so many people—such as Flint, Song, David, Ivor, Bo, Xingrui, and many others. I also understand many truly talented students may not have the opportunities to reach their full potential. So, if you are an undergrad or master student and believe I could offer advice, information, or opportunities that might help with your career, feel free to reach out to me by email. Meanwhile, I am also a student mentor in CFAR, IHPC, A*STAR and HPC-AI Lab in NUS. If you need visiting, intern, RA opportunities or unofficial Ph.D applicaiton consultation, you are welcomed to drop me an email for a chat.

My honor to have mentored and worked with these talented individuals:

Mentees (research-based, from CFAR, IHPC, A*STAR or NUS HPC-AI Lab):

Yaxin Zhou (Master's student @ CMU, America; Author of [[CaveAgent]])
Jingxuan Wu (Master's student @ UNC, America; Author of [[OSCAR]])
Xu Pan (Master’s student @ Wuhan University)
Chubin Zhang (Master’s student @ Beijing University of Posts and Telecommunications; Author of [[GoRL]])
Dongchu Xie (Undergrad @ CUHK(SZ))
Xin Yan (Undergrad @ Beijing Normal University)

🔥 News

2026.01 🎉 We released our new work CaveAgent (a new product-inspired function calling paradigm for LLM Agent) ([Paper], [Source Code])!
2025.12 🎉 We released our new work GoRL for online Reinforcement Learning with generative policies. ([Paper], [Code])
2025.10 🎉 Our paper FM-IRL (Flow Matching for inverse RL) is on Arxiv. ([Paper], [Code]).
2025.10 🎉 OSCAR (training free technique for diverse image generation) is on Arxiv. ([Paper], [Code]).
2025.08 🎉🎉 Received my B.Sc from The Chinese University of Hong Kong with 1-st class honor, looking forward to new journey!
2025.05 🎉🎉 EBC is accepted by ICML 2025 (Code available).
2025.03 🎉🎉 One paper accepted by ICLR 2025 (generative models for robot learning workshop).
2024.12 🎉🎉 One paper accepted by AAMAS 2025 (oral).
2024.12 🎉🎉 One paper accepted by AAAI 2025 (oral).
2024.09 🎉🎉 One invention patent is published.
2024.09 🎉🎉 Awarded Academic Performance Scholarship (for top 5% students) for consecutive two years.
2024.05 🎉🎉 One invention patent is officially granted.
2023.12 🎉🎉 As tech co-founder, I co-founded enterprise “Metasequoia Intelligence” based in Shenzhen, China.
2021.09 🎉🎉 Awarded Zhejiang Guolong Inspirational and Diligentia Bowen Scholarship to support my undergraduate studies in CUHK(SZ).
2019.09 🎉🎉 Lucky to win the 1-st prize in Provincial Chinese Mathematics Olympiad (CMO). Thanks for this intellectually-rewarding experience.

📝 Selected Publications

Conference papers and Preprints

Please scroll down to view all publications. * denotes joint-first-author and equal contribution.

Technical Report

CaveAgent: Transforming LLMs into Stateful Runtime Operators

Maohao Ran*, Zhenglin Wan*, Cooper Lin, (etc..) , Bo An, Yike Guo, Jun Song

Technical Report

Core product technology adopted by the HKGAI and InnoHK. [website]

A new function calling paradigm for LLM agents featuring Stateful Runtime Management.

Paper Code Website
ICML 2025

Diversifying Policy Behaviors via Extrinsic Behavioral Curiosity

Zhenglin Wan*, Xingrui Yu*, David Bossens, Yueming Lyu, Qing Guo, Flint Xiaofeng Fan, Yew Soon Ong, Ivor Tsang

Forty-Second International Conference on Machine Learning

Enables agent to efficiently learn diverse and high-performing policies via Quality Diversity (Inverse) Reinforcement Learning.

Paper Code Website
preprint

GoRL: An Algorithm-Agnostic Framework for Online Reinforcement Learning with Generative Policies

Chubin Zhang*, Zhenglin Wan*, Feng Chen, Xingrui Yu, Ivor Tsang, Bo An

preprint

Generic Framework for Online Reinforcement Learning with Generative Policy class (e.g., Diffusion Policy & Flow-Matching Policy).

Paper Code Website
preprint

OSCAR: Orthogonal Stochastic Control for Alignment-Respecting Diversity in Flow Matching

Jingxuan Wu*, Zhenglin Wan*, Xingrui Yu, Yuzhe Yang, Bo An, Ivor Tsang

preprint

Enables Flow-Matching model to generate diverse but alignment-respecting images in training-free manner.

Paper Code Website
preprint

AgentOCR: Reimagining Agent History via Optical Self-Compression

Lang Feng*, Fuchao Yang*, Feng Chen, Xin Cheng, Haiyang Xu, Zhenglin Wan, Ming Yan, Bo An

preprint

Reinforcing Self-Compression for Optical Agent Memory.

Paper Code Website
AAMAS 2025 (oral)

Imitation From Diverse Behaviors: Wasserstein Quality Diversity Imitation Learning with Single-Step Archive Exploration

Xingrui Yu*, Zhenglin Wan*, David Bossens, Yueming Lyu, Qing Guo, Ivor Tsang

The 24th International Conference on Autonomous Agents and Multiagent Systems

Oral presentation

A new paradigm for imitation learning.

Paper Code Website
preprint

FM-IRL: Flow-Matching for Reward Modeling and Policy Regularization in Reinforcement Learning

Zhenglin Wan*, Jingxuan Wu*, Xingrui Yu, Chubin Zhang, Mingcong Lei, Bo An, Ivor Tsang

preprint

Equips Flow-Matching policies with exploratory strength when learning from demonstrations.

Paper Code Website
AAAI 2025 (oral)

POI Recommendation via Multi-Objective Adversarial Imitation Learning

Zhenglin Wan*, Anjun Gao*, Xingrui Yu, Pingfu Chao, Jun Song, Maohao Ran

The 39th Annual AAAI Conference on Artificial Intelligence

Oral presentation (4.65%)

Recommendate next Point-of-interest with sparse and noisy data via inverse RL.

Paper Code Website
ICLR 2025 (workshop)

Generative Quality Diversity Imitation Learning for Robot Skill Acquisition

Zhenglin Wan, Xingrui Yu, David Bossens, Yueming Lyu, Qing Guo, Flint Xiaofeng Fan, Ivor Tsang

The Thirteenth International Conference on Learning Representations

Quality Diversity Imitation Learning techniques for robot to acquire variety of skills.

Paper Code Website
preprint

RoboMemory: A Brain-inspired Multi-memory Agentic Framework for Interactive Environmental Learning in Physical Embodied Systems

Mingcong Lei, Honghao Cai, (etc..) , Zhenglin Wan, Zhen Li, Shuguang Cui, Yiming Zhao, Yatong Han

preprint

A brain-inspired framework for embodied agentic learning.

Paper Code Website
preprint

Graph-based Reinforcement learning Approach for influential Node Detection in airport delay networks

Chi Li, Zhenglin Wan, Kaize Wang, Yuxuan Huang, Chengxi Li, Lianmin Zhang, Xiongwen Qian, Jianfeng Mao

preprint

RL to solve combinatorial optimization problems in transportation systems.

Paper Code Website
ADC 2024 (oral & best paper runner-up)

Hierarchical Spatial-Temporal Graph-Enhanced Model for Map-Matching

Anjun Gao*, Zhenglin Wan*, Pingfu Chao, Shunyu Yao

Australasian Database Conference

Oral presentation & Best Paper Runner-up

Map-Matching techniques with strong aware of spatial-temporal information.

Paper Code Website

Invention Patents

As these works are patented in China, all these names are directly translated from Chinese.

A Method, System, Terminal Device, and Storage Medium for Air Quality Spatial Inference (Granted)

Inventor: Jun Song, Yibo Xu, Yiwen Pan, Maohao Ran, Zhenglin Wan, Xiaoyun Yan, Yike Guo
A Single-UAV Atmospheric Pollutant Source Tracing Method Based on Gradient Ascent and Physical Kinematics (Public)

Inventor: Zhenglin Wan, Jun Song, Yibo Xu, Maohao Ran, Yike Guo

White Paper

As these works are presented in China, these names are directly translated from Chinese.

White Paper on Cross-Border Economic Large Language Model

📖 Educations

Doctor of Philosophy (Ph.D)

National University of Singapore (NUS)
- Affiliation: Department of Computer Science, School of Computing
- Advisor: Prof. Yang You
2026.01 -
Bachelor of Science (B.Sc)

The Chinese University of Hong Kong (CUHK)
- 1-st class honor
- Major: Statistics & Data Science, GPA: 3.85/4.0, Rank: 7%
- Finished my undergraduate in CUHK-Shenzhen campus, while the degree is offered by CUHK.
2021.09 - 2025.08

💻 Internships and Work Experiences

Full-time Research Staff

Nanyang Technological University, Singapore
- Affiliation: College of Computing and Data Science (CCDS)
- Advisor: Prof. Bo An
2025.09 - 2026.01
Intern (Remote)

Hong Kong Generative AI Research & Development Center (HKGAI), HKUST
- Affiliation: HKGAI
- Director: Prof. Yike Guo
- Proposed a new product-inspired Agentic Function Calling paradigm ([[CaveAgent]]).
2025.05 - 2025.09
Intern Researcher

Agency for Science, Technology and Research (A*STAR), Singapore
- Affiliation: Centre for Frontier AI Research (CFAR), Institute of High Performance Computing (IHPC)
- Advisor: Prof. Ivor Tsang, Dr. Xingrui Yu
2024.07 - 2025.09 (3 months full-time + 11 months part-time)
Research Assistant

The Chinese University of Hong Kong, Shenzhen
- Affiliation: School of Data Science
- Advisor: Prof. Jianfeng Mao
2023.06 - 2024.06
Machine Learning Algorithm Engineer Intern

HUIYINTONG, Shenzhen, China
- Mentor: Dr. Jun Song
2023.07 - 2024.01

🎈 Services

Reviewer of AAAI, ICLR, ICML, NeurIPS

🎖 Honors, Awards and Scholarships

NUS Research Scholarship (approx. ￥250000/year plus full tuition fee subsidy, for entire Ph.D studies in NUS)
1-st class honor undergraduate student awarded by The Chinese University of Hong Kong
Yearly Academic Scholarship: B Class (for GPA Top 3%, ￥40000)
Yearly Academic Scholarship: C Class (for GPA Top 5%, ￥20000)
Yearly Dean List Award (Outstanding 1-st class Performance, for 3 years)
Diligentia Bowen Scholarship (￥120000, Undergraduate Admission Scholarship for 1-st prize in Provincial CMO)
Zhejiang Guolong Inspirational Scholarship (￥120000, Undergraduate Admission Scholarship for top 0.5% students in Chinese College Entrance Exam)
1st-Prize in Chinese Mathematics Olympiad (CMO)-Chongqing Province

💬 Press/Media

White Paper

The co-author of the first White Paper on Cross-Border Economic Large Language Model in Shenzhen, China. 深圳卫视：深圳发布首个跨境经济大模型白皮书

Miscellaneous

In my spare time, I’m an music enthusiast. I’ve been playing guitar for more than 10 years and began teaching myself the piano when I was 15. During my undergraduate, I played music in two bands: “Minor Blue” and “Major Pink.” See our photos:

I am also a 15-years chess player, with the honor of “National Level-3 Athlete”. I love the process of comprehensive planning, logical-thinking and reasoning. Visit my Lichess profile.
I love play basketball 🏀. Sports makes me energetic.
I play video games like League of Legends, where I achieved the “diamond” level as my historically highest honor. I also play 3A games like Elden Ring, Dark Souls, Nier Automata, and elder scrolls.
I have a deep interest in philosophy of mind, particularly Buddhism and Taoism, as paths to explore the fundamental nature of human existence. I am also intrigued by the potential integration of these philosophical insights with modern artificial intelligence.

Zhenglin (Carlos) Wan

😊 About me

🔥 News

📝 Selected Publications

Conference papers and Preprints

Invention Patents

White Paper

📖 Educations

💻 Internships and Work Experiences

🎈 Services

🎖 Honors, Awards and Scholarships

💬 Press/Media

Miscellaneous