I am a Research Scientist at Google, working on the Veo project. Currently, I am focusing on real-time, interactive, high-resolution video generation.
Prior to Google, I was training diffusion-autoregressive model Hunyuan-image 3.0 as a Research Scientist in Tencent.
My research lies in Multimodal Generative AI, especially in image and Video Synthesis. I am interested in building AI systems that can simulate our dynamic visual world for creative control, and bring real-time interactive experiences to human beings.
July 2023: FateZero is accepted by ICCV 2023 as an Oral presentation.
Feb 2023: Two papers are accepted by CVPR 2023!
Research projects and products
Research Scientist, Google, Mountain View 2026 - Present
I am working on the Veo project, a high-resolution video generation model.
Research Scientist, Tencent 2024 - 2025
In diffusion-autoregressive model Hunyuan-image 3.0, I am responsible for semantic encoder, text-to-image pre-training and Identity-preserving instruction editing.
Publications
I am fortunate to collaborate with talented students and researchers around the world.
We work on multimodal generative models for image and video synthesis.
Hover your mouse over the image box below to view more results.
Instruction-based Image Editing with Planning, Reasoning, and Generation
Editing your video via pretrained Stable Diffusion model without training.
(e.g., Replace the jeep with a posche car; Add Van Gogh style to the sunflower)
Identity-preserving talking head generation utilizing dense landmarks and
spatial-temporal enhancement with GAN priors.
(e.g., Make Marilyn Monroe speak as the motions of another person in the driving video)