Our Projects

  • RAGEN

    RAGEN

    We introduce RAGEN to train LLM reasoning agents via RL in multi-turn, stochastic environments. RAGEN is formulated with MDP and optimized through Reasoning-Interaction Chain Optimization (RICO). RAGEN-0.5B is trained across three agentic tasks, showing intriguing reasoning patterns.

  • VAGEN

    VAGEN

    VAGEN is an RL framework improving VLM agent training with the TRICO algorithm. By selectively focusing on critical tokens and enhancing cross-turn credit assignment, TRICO outperforms prior methods on visual agentic tasks.

  • Embodied Agent Interface

    Embodied Agent Interface

    Current evaluations of LLMs in embodied AI lack standardization and detailed error analysis. Our proposed benchmark addresses this with a unified interface for diverse tasks and LLM modules (planning, decomposition, etc.) and fine-grained metrics (identifying hallucination, affordance errors, etc.). This enables systematic assessment, pinpointing specific LLM limitations and strengths to inform more effective integration into embodied agents.

  • T*

    T*

    We introduce LongVideoHaystack, a 480-hour video temporal search dataset with 15,092 human-annotated instances, where SOTA scores 2.1% Temporal F1. Our temporal search framework T* boosts GPT-4o from 50.5% to 53.1% and LLaVA-OV from 56.5% to 62.4% on LongVideoBench XL.