I’m a second-year master’s student at Zhejiang University. I have published several papers as the first author at top AI conferences such as ICLR, ICML, ACL and ACM MM. Previously, I was fortunate to intern at the Social Computing Group at Microsoft Research Asia (MSRA), where I worked on streaming video understanding under the mentorship of Jianxun Lian.

My research focuses on Multimodal Large Language Models, especially the applications of Vision-Language Models and effective fine-tuning strategies. Recently, I’ve been particularly interested in streaming video understanding, aiming to enable models to continuously interpret live video streams with strong temporal reasoning and timely responses. My long-term goal is to build a truly user-friendly AI assistant—reliable, practical, and proactive—that can understand visual content, communicate naturally, and help users accomplish real-world tasks with a consistently solid experience.

📖 Educations

2024.09 - 2027.06 (now), Master Student, Software School, Software Engineer, Zhejiang University.
2020.09 - 2024.06, Undergraduate, Software College, Software Engineering (International (English)), Northeastern University.