I’m a second-year master’s student at Zhejiang University. I have published several papers as the first author at top AI conferences such as ICLR and ACM MM. My research focuses on Multimodal Large Language Models, especially the applications of Vision-Language Models and effective fine-tuning strategies. Recently, I’ve been particularly interested in streaming video understanding, aiming to enable models to continuously interpret live video streams with strong temporal reasoning and timely responses. My long-term goal is to build a truly user-friendly AI assistant—reliable, practical, and proactive—that can understand visual content, communicate naturally, and help users accomplish real-world tasks with a consistently solid experience.

📖 Educations

  • 2024.09 - 2027.06 (now), Software School, Software Engineer, Zhejiang University.
  • 2020.09 - 2024.06, Undergraduate, Software College, Software Engineering (International (English)), Northeastern University.

🎖 Honors and Awards

  • 2026.2 MSRA Stars of Tomorrow Award.
  • 2025.10 National Scholarship.

📝 Publications

Streaming Video Understanding

Pre-print
sym

Proact-VL: A Proactive VideoLLM for Real-Time AI Companions

Weicai Yan, Yuhong Dai, Qi Ran, Haodong Li, Wang Lin, Hao Liao, Xing Xie, Tao Jin, Jianxun Lian

Parameter-Efficient Fine-Tuning

ICLR 2025
sym

Diff-Prompt: Diffusion-Driven Prompt Generator with Mask Supervision

Weicai Yan, Wang Lin, Zirun Guo, Ye Wang, Fangming Feng, Xiaoda Yang, Zehan Wang, Tao Jin

  • Code.
  • Prompt Visualization. sym
ACM MM 2024
sym

Low-rank Prompt Interaction for Continual Vision-Language Retrieval

Weicai Yan, Ye Wang, Wang Lin, Zirun Guo, Zhou Zhao, Tao Jin

💻 Internships

Services

Reviewer: ACM MM 2025, ICLR 2025, ICLR 2026