Yisheng He (何益升)

Yisheng He is a researcher at Alibaba. He obtained his Ph.D. at HKUST, advised by Prof. Qifeng Chen, Prof. Long Quan, and Dr. Jian Sun.

Email  /  Google Scholar  /  GitHub

🔥We are now actively hiring research interns. If you are interested in working with me, feel free to email me your CV.

profile photo
Research

I'm interested in 3D Computer Vision, AIGC, Embodied AI, and Digital Avatar.


( * denotes equal contribution; ^ denotes intern student; ✉ denotes corresponding author.)
Forge4D: Feed-Forward 4D Human Reconstruction and Interpolation from Uncalibrated Sparse-View Videos
Yingdong Hu*^, Yisheng He*✉, Jinnan Chen, Weihao Yuan, Kejie Qiu, Zehong Lin, Siyu Zhu, Zilong Dong, Jun Zhang
Preprint, 2025
project page / paper / code GitHub stars

Forge4D is the first feed-forward model for 4D human Gaussian reconstruction in real world metric scale, and enables novel-view and novel-time synthesis from uncalibrated sparse-view videos in an efficient streaming manner.

PanoLAM: Large Avatar Model for Gaussian Full-Head Synthesis from One-shot Unposed Image
Peng Li*^, Yisheng He*✉, Yingdong Hu^, Yuan Dong, Weihao Yuan, Yuan Liu, Siyu Zhu, Gang Cheng, Zilong Dong, Yike Guo
Preprint, 2025
project page / paper / code

PanoLAM is a large avatar model for Gaussian full-head reconstruction from a single unposed image. It utilize coarse-to-fine and dual-branch frameworks that creates Gaussian full-head within a second.

CoProSketch: Controllable and Progressive Sketch Generation with Diffusion Model
Ruohao Zhan*, Yijin Li*, Yisheng He, Shuo Chen, Yichen Shen, Xinyu Chen, Zilong Dong, Zhaoyang Huang, Guofeng Zhang
ACMM, 2025
paper

CoProSketch provides prominent controllability and details for sketch generation with diffusion models.

LAM: Large Avatar Model for One-shot Animatable Gaussian Head
Yisheng He*, Xiaodong Gu*, Xiaodan Ye, Chao Xu, Zhengyi Zhao, Yuan Dong, Weihao Yuan, Zilong Dong, Liefeng Bo
SIGGRAPH, 2025
project page / paper / code GitHub stars

LAM creates animatable Gaussian heads with one-shot images in a single forward pass, which can be reenacted and rendered on various platforms (including mobile phones) in real time.

LaMP: Language-Motion Pretraining for Motion Generation, Retrieval, and Captioning
Zhe Li^, Weihao Yuan, Yisheng He, Lingteng Qiu, Shenhao Zhu, Xiaodong Gu, Weichao Shen, Yuan Dong, Zilong Dong, Laurence T. Yang
ICLR, 2025
project page / paper / code GitHub stars

LaMP is a language-motion pretraining model that advances text-to-motion generation, motion-text retrieval, and motion captioning through aligned language-motion representation learning.

MulSMo: Multimodal Stylized Motion Generation by Bidirectional Control Flow
Zhe Li^, Yisheng He, Zhong Lei, Weichao Shen, Qi Zuo, Lingteng Qiu, Shenhao Zhu, Zilong Dong, Laurence T. Yang, Weihao Yuan
Arxiv, 2025
paper

We build a bidirectional control flow between the style and the content for stylized motion generation and enable multimodal style control including text, image, and style motions.

Gaussian-Informed Continuum for Physical Property Identification and Simulation
Junhao Cai^, Yuji Yang, Weihao Yuan, Yisheng He, Zilong Dong, Liefeng Bo, Hui Cheng, Qifeng Chen
NeurIPS, 2024 (Oral Presentation)
project page / paper / code GitHub stars

We introduce a hybrid framework that leverages 3D Gaussian representation to advance physical property identification.

clean-usnob MoGenTS: Motion Generation based on Spatial-Temporal Joint Modeling
Weihao Yuan*, Yisheng He*, Weichao Shen, Yuan Dong, Xiaodong Gu, Zilong Dong, Liefeng Bo, Qixing Huang
NeurIPS, 2024
paper

We introduce a 2D joint VQVAE to quantize each joint instead of all joints into tokens. A spatial-temporal modeling framework with temporal-spatial 2D masking and 2D attention is also proposed for motion generation.

clean-usnob Freditor: High-Fidelity and Transferable NeRF Editing by Frequency Decomposition
Yisheng He, Weihao Yuan, Siyu Zhu, Zilong Dong, Liefeng Bo, Qixing Huang
ECCV, 2024
project page / paper

We enable high-fidelity, transferable, and intensity control for neural field editing.

clean-usnob Sketch2NeRF: Multi-view Sketch-guided Text-to-3D Generation
Minglin Chen^, Longguang Wang, Weihao Yuan, Yukun Wang, Zhe Sheng, Yisheng He, Zilong Dong, Liefeng Bo, Yulan Guo
Arxiv, 2024
paper

Our method synthesizes consistent 3D content with fine-grained sketch control.

clean-usnob OV9D: Open-Vocabulary Category-Level 9D Object Pose and Size Estimation
Junhao Cai*^, Yisheng He*, Weihao Yuan, Siyu Zhu, Zilong Dong, Liefeng Bo, Qifeng Chen,
IEEE Robotics and Automation Letters (RA-L), 2024
project page / paper / code GitHub stars

We introduce a new problem: open-vocabulary 9D object pose and size estimation, a new dataset: OO3D-9D, and a new framework based on vision foundation model to tackle this problem.

clean-usnob Towards Self-Supervised Category-Level Object Pose and Size Estimation
Yisheng He, Haoqiang Fan, Haibin Huang, Qifeng Chen, Jian Sun
Arxiv, 2022
project page / paper

A self-supervised framework for category-level object pose and size estimation via differentiable shape deformation, registration, and rendering.

clean-usnob FS6D: Few-Shot 6D Pose Estimation of Novel Objects
Yisheng He, Yao Wang, Haoqiang Fan, Jian Sun, Qifeng Chen
CVPR, 2022
project page / paper / data / code GitHub stars

A new open-set few-shot 6D object pose estimation problem: estimating the 6D pose of an unknown object by a few support views without CAD models and extra training. A large-scale synthesis dataset for pre-training and benchmarks for future research.

clean-usnob FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation
Yisheng He, Haibin Huang, Haoqiang Fan, Qifeng Chen, Jian Sun
CVPR, 2021 (Oral Presentation)
project page / paper / code GitHub stars / video (youtube) / video (bilibili)

A generic full flow bidirectional fusion framework for RGBD representation learning, applied to joint instance semantic segmentation and 3D keypoint-based 6D pose estimation.

clean-usnob iShape: A First Step Towards Irregular Shape Instance Segmentation
Lei Yang, Ziwei Yan, Yisheng He, Wei Sun, Zhenhang Huang, Haibin Huang, Haoqiang Fan
arXiv, 2021
project page / paper / code / dataset

A brand new dataset to promote the study of instance segmentation for objects with irregular shapes and an affinity-based algorithm to tackle it.

clean-usnob PVN3D: A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation
Yisheng He, Wei Sun, Haibin Huang, Jianran Liu, Haoqiang Fan, Jian Sun
CVPR, 2020
project page / paper / code GitHub stars / video (youtube) / video (bilibili)

The first deep learning 3D keypoint-based 6D pose estimation algorithm and an overall framework for joint instance semantic segmantation and 3D keypoint detection.

Academic Challenge
clean-usnob Rank 2nd in OCRTOC: Open Cloud Robot Table Organization Challenge , 2020
Services

  • Program Committee/Reviewers: CVPR, ICCV, ECCV, NeurIPS, ICLR, AAAI, ACMM, ICRA, IROS, TPAMI, IJCV, RAL, Neurocomputing
  • Teaching Assistant @ HKUST: COMP 4201 (Spring 2019), COMP 1029 (Fall 2020), COMP 4201 (Spring 2021)

  • Last updated: October, 2025.

    Thanks Dr. Jon Barron for sharing the template code.