ICCV 2025 RIWM WORKSHOP

COMBAT: Conditional World Models for Behavioral Agent Training

Workshop on Reliable and Interactable World Models (RIWM)

Anmol Agarwal, Pranay Meshram, Sumer Singh, Saurav Suman, Andrew Lapp, Shahbuland Matiana, Louis Castricato, Spencer Frazier

📄 Download Paper 💻 View Code 📖 Read More

Abstract

COMBAT introduces a novel world-modeling approach for robust visual understanding in realistic multi-object and multi-view scenarios. The model emphasizes generalization and efficiency for RWIMM datasets, bridging the gap between simulation and real-world imagery for downstream tasks such as localization, segmentation, and control. Our approach combines generative and discriminative components in a unified architecture, enabling efficient training and strong performance across diverse benchmarks.

Key Highlights

What We Did

Developed a novel world model architecture combining generative and discriminative components
Designed efficient training pipeline for multi-view realistic imagery
Created robust evaluation framework for RWIMM benchmarks

What We Achieved

State-of-the-art performance on RWIMM benchmarks
Strong generalization across diverse datasets
Extensive ablations validating design choices

Impact

Bridges simulation and real-world visual understanding
Enables improved localization and segmentation
Open-source code for community reproducibility

Keywords

World Models RWIMM Multi-View Segmentation Computer Vision ICCV 2025

Real-Time Gameplay Demonstrations

Watch COMBAT in action with real-time agent interactions

▶

Gameplay Video 1

Agent behavior demonstration

▶

Gameplay Video 2

Multi-view tracking example

▶

Gameplay Video 3

Real-time segmentation

▶

Gameplay Video 4

World model predictions

💡 To add videos: Replace the video-placeholder divs with <video> tags pointing to your gameplay videos

Model Architecture

COMBAT's unified architecture for behavioral agent training

Figure: COMBAT architecture overview

🏗️ Core Components

COMBAT integrates conditional world modeling with behavioral agent training, featuring:

Multi-view encoder for realistic image understanding
Temporal dynamics predictor with latent state modeling
Policy network conditioned on world model representations
End-to-end differentiable architecture for joint optimization

Citation

If you find COMBAT useful in your research, please consider citing:

📄 BibTeX

@inproceedings{
  agarwal2025combat,
  title={{COMBAT}: Conditional World Models for Behavioral Agent Training},
  author={Anmol Agarwal and Pranay Meshram and Sumer Singh and Saurav Suman and Andrew Lapp and Shahbuland Matiana and Louis Castricato and Spencer Frazier},
  booktitle={1st Workshop on Reliable and Interactive World Model in Computer Vision Non Archival},
  year={2025},
  url={https://openreview.net/forum?id=i7afmqzE4b}
}

Full Paper

Read the complete paper with detailed methodology, experiments, and analysis

Having trouble viewing? Download PDF