|
Yisheng He (何益升)
Yisheng He is a researcher at Alibaba. He obtained his Ph.D. at HKUST,
advised by Prof. Qifeng Chen, Prof. Long Quan, and Dr. Jian Sun.
Email  / 
Google Scholar  / 
GitHub
🔥 We are now actively hiring research interns. The successful candidates will conduct research to publish at leading international conferences. To apply, please email your CV to: ethanheysh@gmail.com.
|
|
|
Research
I'm interested in 3D Computer Vision, AIGC, Embodied AI, and Digital Avatar.
( * denotes equal contribution; ^ denotes intern student; ✉ denotes corresponding author.)
|
|
Forge4D: Feed-Forward 4D Human Reconstruction and Interpolation from Uncalibrated Sparse-View Videos
Yingdong Hu*^, Yisheng He*✉, Jinnan Chen, Weihao Yuan, Kejie Qiu, Zehong Lin, Siyu Zhu, Zilong Dong, Jun Zhang
Preprint, 2025
project page /
paper /
code
Forge4D is the first feed-forward model for 4D human Gaussian reconstruction in real world metric scale, and enables novel-view and novel-time synthesis from uncalibrated sparse-view videos in an efficient streaming manner.
|
|
PanoLAM: Large Avatar Model for Gaussian Full-Head Synthesis from One-shot Unposed Image
Peng Li*^, Yisheng He*✉, Yingdong Hu^, Yuan Dong, Weihao Yuan, Yuan Liu, Siyu Zhu, Gang Cheng, Zilong Dong, Yike Guo
Preprint, 2025
project page /
paper /
code
PanoLAM is a large avatar model for Gaussian full-head reconstruction from a single unposed image. It utilize coarse-to-fine and dual-branch frameworks that creates Gaussian full-head within a second.
|
|
CoProSketch: Controllable and Progressive Sketch Generation with Diffusion Model
Ruohao Zhan*, Yijin Li*, Yisheng He, Shuo Chen, Yichen Shen, Xinyu Chen, Zilong Dong, Zhaoyang Huang, Guofeng Zhang
ACMM, 2025
paper
CoProSketch provides prominent controllability and details for sketch generation with diffusion models.
|
|
LAM: Large Avatar Model for One-shot Animatable Gaussian Head
Yisheng He*, Xiaodong Gu*, Xiaodan Ye, Chao Xu, Zhengyi Zhao, Yuan Dong, Weihao Yuan, Zilong Dong, Liefeng Bo
SIGGRAPH, 2025
project page /
paper /
code
LAM creates animatable Gaussian heads with one-shot images in a single forward pass, which can be reenacted and rendered on various platforms (including mobile phones) in real time.
|
|
LaMP: Language-Motion Pretraining for Motion Generation, Retrieval, and Captioning
Zhe Li^, Weihao Yuan, Yisheng He, Lingteng Qiu, Shenhao Zhu, Xiaodong Gu, Weichao Shen, Yuan Dong, Zilong Dong, Laurence T. Yang
ICLR, 2025
project page /
paper /
code
LaMP is a language-motion pretraining model that advances text-to-motion generation, motion-text retrieval, and motion captioning through aligned language-motion representation learning.
|
|
MulSMo: Multimodal Stylized Motion Generation by Bidirectional Control Flow
Zhe Li^, Yisheng He, Zhong Lei, Weichao Shen, Qi Zuo, Lingteng Qiu, Shenhao Zhu, Zilong Dong, Laurence T. Yang, Weihao Yuan
Arxiv, 2025
paper
We build a bidirectional control flow between the style and the content for stylized motion generation and enable multimodal style control including text, image, and style motions.
|
|
Gaussian-Informed Continuum for Physical Property Identification and Simulation
Junhao Cai^, Yuji Yang, Weihao Yuan, Yisheng He, Zilong Dong, Liefeng Bo, Hui Cheng, Qifeng Chen
NeurIPS, 2024 (Oral Presentation)
project page /
paper /
code
We introduce a hybrid framework that leverages 3D Gaussian representation to advance physical property identification.
|
|
MoGenTS: Motion Generation based on Spatial-Temporal Joint Modeling
Weihao Yuan*, Yisheng He*, Weichao Shen, Yuan Dong, Xiaodong Gu, Zilong Dong, Liefeng Bo, Qixing Huang
NeurIPS, 2024
paper
We introduce a 2D joint VQVAE to quantize each joint instead of all joints into tokens. A spatial-temporal modeling framework with temporal-spatial 2D masking and 2D attention is also proposed for motion generation.
|
|
Freditor: High-Fidelity and Transferable NeRF Editing by Frequency Decomposition
Yisheng He, Weihao Yuan, Siyu Zhu, Zilong Dong, Liefeng Bo, Qixing Huang
ECCV, 2024
project page /
paper
We enable high-fidelity, transferable, and intensity control for neural field editing.
|
|
Sketch2NeRF: Multi-view Sketch-guided Text-to-3D Generation
Minglin Chen^, Longguang Wang, Weihao Yuan, Yukun Wang, Zhe Sheng, Yisheng He, Zilong Dong, Liefeng Bo, Yulan Guo
Arxiv, 2024
paper
Our method synthesizes consistent 3D content with fine-grained sketch control.
|
|
OV9D: Open-Vocabulary Category-Level 9D Object Pose and Size Estimation
Junhao Cai*^, Yisheng He*, Weihao Yuan, Siyu Zhu, Zilong Dong, Liefeng Bo, Qifeng Chen,
IEEE Robotics and Automation Letters (RA-L), 2024
project page /
paper /
code
We introduce a new problem: open-vocabulary 9D object pose and size estimation, a new dataset: OO3D-9D, and a new framework based on vision foundation model to tackle this problem.
|
|
Towards Self-Supervised Category-Level Object Pose and Size Estimation
Yisheng He, Haoqiang Fan, Haibin Huang, Qifeng Chen, Jian Sun
Arxiv, 2022
project page /
paper
A self-supervised framework for category-level object pose and
size estimation via differentiable shape deformation, registration, and rendering.
|
|
FS6D: Few-Shot 6D Pose Estimation of Novel Objects
Yisheng He, Yao Wang, Haoqiang Fan, Jian Sun, Qifeng Chen
CVPR, 2022
project page /
paper /
data /
code
A new open-set few-shot 6D object pose estimation problem:
estimating the 6D pose of an unknown object by a few support views without CAD models and extra training.
A large-scale synthesis dataset for pre-training and benchmarks for future research.
|
|
FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation
Yisheng He, Haibin Huang, Haoqiang Fan, Qifeng Chen, Jian Sun
CVPR, 2021 (Oral Presentation)
project page /
paper /
code
/
video (youtube) /
video (bilibili)
A generic full flow bidirectional fusion framework for RGBD representation learning,
applied to joint instance semantic segmentation and 3D keypoint-based 6D pose estimation.
|
|
iShape: A First Step Towards Irregular Shape Instance Segmentation
Lei Yang, Ziwei Yan, Yisheng He, Wei Sun, Zhenhang Huang, Haibin Huang, Haoqiang Fan
arXiv, 2021
project page /
paper /
code /
dataset
A brand new dataset to promote the study of instance segmentation for objects with irregular shapes and
an affinity-based algorithm to tackle it.
|
|
PVN3D: A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation
Yisheng He, Wei Sun, Haibin Huang, Jianran Liu, Haoqiang Fan, Jian Sun
CVPR, 2020
project page /
paper /
code
/
video (youtube) /
video (bilibili)
The first deep learning 3D keypoint-based 6D pose estimation algorithm and an overall framework for joint instance semantic segmantation and 3D keypoint detection.
|
|
Services
Program Committee/Reviewers: CVPR, ICCV, ECCV, NeurIPS, ICLR, AAAI, ACMM, ICRA, IROS, TPAMI, IJCV, RAL, Neurocomputing
Teaching Assistant @ HKUST: COMP 4201 (Spring 2019), COMP 1029 (Fall 2020), COMP 4201 (Spring 2021)
|
|