ICCV 2025 RIWM WORKSHOP

COMBAT: Conditional World Models for Behavioral Agent Training

Workshop on Reliable and Interactable World Models (RIWM)

Anmol Agarwal, Pranay Meshram, Sumer Singh, Saurav Suman, Andrew Lapp, Shahbuland Matiana, Louis Castricato, Spencer Frazier
COMBAT Model Overview

Figure 1: COMBAT model architecture and key results on RWIMM benchmark

Abstract

COMBAT introduces a novel world-modeling approach for robust visual understanding in realistic multi-object and multi-view scenarios. The model emphasizes generalization and efficiency for RWIMM datasets, bridging the gap between simulation and real-world imagery for downstream tasks such as localization, segmentation, and control. Our approach combines generative and discriminative components in a unified architecture, enabling efficient training and strong performance across diverse benchmarks.

Key Highlights

What We Did

  • Developed a novel world model architecture combining generative and discriminative components
  • Designed efficient training pipeline for multi-view realistic imagery
  • Created robust evaluation framework for RWIMM benchmarks

What We Achieved

  • State-of-the-art performance on RWIMM benchmarks
  • Strong generalization across diverse datasets
  • Extensive ablations validating design choices

Impact

  • Bridges simulation and real-world visual understanding
  • Enables improved localization and segmentation
  • Open-source code for community reproducibility

Keywords

World Models RWIMM Multi-View Segmentation Computer Vision ICCV 2025

Real-Time Gameplay Demonstrations

Watch COMBAT in action with real-time agent interactions

Gameplay Video 1

Agent behavior demonstration

Gameplay Video 2

Multi-view tracking example

Gameplay Video 3

Real-time segmentation

Gameplay Video 4

World model predictions

💡 To add videos: Replace the video-placeholder divs with <video> tags pointing to your gameplay videos

Model Architecture

COMBAT's unified architecture for behavioral agent training

COMBAT Architecture Diagram

Figure: COMBAT architecture overview

🏗️ Core Components

COMBAT integrates conditional world modeling with behavioral agent training, featuring:

  • Multi-view encoder for realistic image understanding
  • Temporal dynamics predictor with latent state modeling
  • Policy network conditioned on world model representations
  • End-to-end differentiable architecture for joint optimization

Citation

If you find COMBAT useful in your research, please consider citing:

📄 BibTeX
@inproceedings{
  agarwal2025combat,
  title={{COMBAT}: Conditional World Models for Behavioral Agent Training},
  author={Anmol Agarwal and Pranay Meshram and Sumer Singh and Saurav Suman and Andrew Lapp and Shahbuland Matiana and Louis Castricato and Spencer Frazier},
  booktitle={1st Workshop on Reliable and Interactive World Model in Computer Vision Non Archival},
  year={2025},
  url={https://openreview.net/forum?id=i7afmqzE4b}
}

Full Paper

Read the complete paper with detailed methodology, experiments, and analysis

Having trouble viewing? Download PDF