Chi-Pin Huang

f11942097 [at] ntu.edu.tw

I am currently a Research Scientist at NVIDIA Research, working on vision-language-action models with a focus on reasoning and decision-making for Embodied Intelligence.

I received my Ph.D. from National Taiwan University (NTU) in January 2026 under the supervision of Prof. Yu-Chiang Frank Wang. Prior to my doctoral studies, I obtained my Bachelor’s degree in Computer Science and Information Engineering from NTU in 2022. I was also an Applied Scientist Intern at Microsoft, where I contributed to the development of deep learning models for Bing Maps.

My research focuses on Embodied AI, aiming to equip robots with the ability to reason about tasks and environments through world modeling and reasoning-enhanced vision-language-action models, enabling robust, environment-grounded decision-making and interaction in the physical world.

News

Feb 20, 2026	Our paper Fast-ThinkAct is accepted by CVPR 2026.
Jan 09, 2026	Received my Ph.D. from National Taiwan University (NTU) and will be joining NVIDIA Research as a Research Scientist.
Dec 27, 2025	Our papers “SANTA” and “TA-Prompting” are accepted by WACV 2026.
Sep 18, 2025	Our paper “ThinkAct” is accepted by NeurIPS 2025.
Jun 26, 2025	Our papers “CNS” and “MotionMatcher” are accepted by ICCV 2025.
Feb 27, 2025	Our paper “VideoMage” is accepted by CVPR 2025.
Feb 03, 2025	Join NVIDIA Research as a Research Intern.
Jul 02, 2024	Our papers “Receler” and “Select and Distill” are accepted at ECCV 2024.
Jan 16, 2024	Our paper “RAPPER” is accepted by ICLR 2024.

Selected Publications

CVPR 2026

Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning

Chi-Pin Huang, Yunze Man, Zhiding Yu , Min-Hung Chen, Jan Kautz, Yu-Chiang Frank Wang, and Fu-En Yang

In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026

[arXiv] [Website]
NeurIPS 2025

ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning

Chi-Pin Huang , Yueh-Hua Wu , Min-Hung Chen, Yu-Chiang Frank Wang, and Fu-En Yang

Advances in Neural Information Processing systems (NeurIPS), 2025

[arXiv] [PDF] [Website]
CVPR 2025

VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models

Chi-Pin Huang, Yen-Siang Wu, Hung-Kai Chung, Kai-Po Chang, Fu-En Yang, and Yu-Chiang Frank Wang

In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

[arXiv] [PDF] [Website]
ECCV 2024

Receler: Reliable concept erasing of text-to-image diffusion models via lightweight erasers

Chi-Pin Huang, Kai-Po Chang, Chung-Ting Tsai, Yung-Hsuan Lai, Fu-En Yang, and Yu-Chiang Frank Wang

In European Conference on Computer Vision (ECCV), 2024

[arXiv] [PDF] [Code] [Website]