ResearcharXivNEW

TimeProVe: Propose, then Verify for Efficient Long Video Temporal Reasoning in Activities of Daily Living

Sinha 2026-06-18

Arkaprava SinhaDominick ReillySiddharth Krishnan

Long Video Question Answering (LVQA) requires identifying sparse, query-relevant evidence within hours-long untrimmed videos. Existing approaches either process videos densely with large vision-language models (VLMs), incurring prohibitive computational cost, or rely on sparse caption-based reasoning, which often misses temporally localized and motion-centric evidence. We introduce TimeProVe, a co

Read on arXiv

Data aggregated and editorially reviewed by TrendMing.

Key Contributions

Long Video Question Answering (LVQA) requires identifying sparse, query-relevant evidence within hours-long untrimmed videos.
Existing approaches either process videos densely with large vision-language models (VLMs), incurring prohibitive computational cost, or rely on sparse caption-based reasoning, which often misses temporally localized and motion-centric evidence.
We introduce TimeProVe, a co

Research Themes

AIResearch

Back to AI Research