Qinian Wang

Hi, I'm Qinian — a first-year Master’s student at Shanghai Jiao Tong University (SJTU), where I’m fortunate to be advised by Weidi Xie. My research focuses on video understanding and multimodal perception, driven by a passion for exploring the intersection between computer vision and brain intelligence.

Before joining SJTU, I completed my undergraduate studies at the University of Electronic Science and Technology of China (UESTC).

I truly enjoy sharing diverse perspectives and collaborating with people from different backgrounds. Please feel free to reach out if you're interested in my research or just want to have a chat!

Email  /  CV  /  Github

Research

Project

NeSeg Project Thumbnail
NeSeg: An Agentic System for Video Segmentation with Positive and Negative Hints
github

In prior work, the VLM interacted with SAM using only positive sample points and bounding boxes to perform video object segmentation. In practice, however, when faced with complex fine-grained segmentation, the lack of negative sample hints left the model with no mechanism for output correction. To address this, we build on that framework and leverage reinforcement learning (RL) to introduce negative sample hints, investigating their role in segmentation tasks.

StreamTeller: Perception-First Memory with Event-Driven Visual Evidence
github

StreamTeller is a training-free plug-in mechanism that addresses the memory-perception conflict in streaming video understanding. It organizes video streams into event nodes, each containing a structured caption and optional visual evidence. An event-driven gating strategy generates captions only upon semantic drift, and a perception-first retrieval strategy loads historical memory on demand. This design explicitly separates perception from memory, improving long-term recall while minimizing interference with current perception.