Our Projects

RAGEN

We introduce RAGEN to train LLM reasoning agents via RL in multi-turn, stochastic environments. RAGEN is formulated with MDP and optimized through Reasoning-Interaction Chain Optimization (RICO). RAGEN-0.5B is trained across three agentic tasks, showing intriguing reasoning patterns.

Blog Code

VAGEN

VAGEN is an RL framework improving VLM agent training with the TRICO algorithm. By selectively focusing on critical tokens and enhancing cross-turn credit assignment, TRICO outperforms prior methods on visual agentic tasks.

Blog Code

Embodied Agent Interface

Current evaluations of LLMs in embodied AI lack standardization and detailed error analysis. Our introduce a unified interface (Embodied Agent Interface) for diverse tasks and LLM modules (planning, decomposition, etc.) and fine-grained metrics (identifying hallucination, affordance errors, etc.). This enables systematic assessment, pinpointing specific LLM limitations and strengths to inform more effective integration into embodied agents.

Paper Website Code

Long Video Haystack

We introduce LongVideoHaystack, a 480-hour video temporal search dataset with 15,092 human-annotated instances, where SOTA scores 2.1% Temporal F1.

Paper Website Code

T*: Temporal Search Plug-in for any VLMs

Our temporal search framework T* boosts GPT-4o from 50.5% to 53.1% and LLaVA-OV from 56.5% to 62.4% on LongVideoBench XL.

Paper Website Code