ResearcharXivNEW

TimeProVe: Propose, then Verify for Efficient Long Video Temporal Reasoning in Activities of Daily Living

Sinha 2026-06-18
Arkaprava SinhaDominick ReillySiddharth Krishnan

Long Video Question Answering (LVQA) requires identifying sparse, query-relevant evidence within hours-long untrimmed videos. Existing approaches either process videos densely with large vision-language models (VLMs), incurring prohibitive computational cost, or rely on sparse caption-based reasoning, which often misses temporally localized and motion-centric evidence. We introduce TimeProVe, a co

Read on arXiv
Data aggregated and editorially reviewed by TrendMing.

Key Contributions

  • Long Video Question Answering (LVQA) requires identifying sparse, query-relevant evidence within hours-long untrimmed videos.
  • Existing approaches either process videos densely with large vision-language models (VLMs), incurring prohibitive computational cost, or rely on sparse caption-based reasoning, which often misses temporally localized and motion-centric evidence.
  • We introduce TimeProVe, a co

Research Themes

AIResearch