I am a Research Scientist at Google, working on the Gemini Omni project, where I focus on real-time, interactive, high-resolution video generation under the supervision of Yael Pritch and Peyman Milanfar. Before Google, I was a Research Scientist at Tencent, training the diffusion–autoregressive model HunyuanImage 3.0.
My research lies in multimodal generative AI, especially image and video synthesis. I am interested in building AI systems that simulate our dynamic visual world with creative control, and bring real-time interactive experiences to human beings.
I received my Ph.D. from the Hong Kong University of Science and Technology (HKUST), advised by Prof. Qifeng Chen, and my B.Eng. in Electrical Engineering from Zhejiang University with a National Scholarship from Chu Kochen Honors College, advised by Prof. Wenyuan Xu.





Multimodal generative models for image and video synthesis. Full list on the Publications page or Google Scholar.

