Hello, I'm Pranay Meshram

Student Researcher

I'm Pranay, a pre-final year B.S. student in Data Science and Engineering at the Indian Institute of Science Education and Research (IISER) Bhopal. My research focuses on Computer Vision, Generative AI, Visual Place Recognition (VPR), Geolocalization, Vision-Language Models (VLMs), and World Models.

I recently contributed to COMBAT (Conditional World Models for Behavioral Agent Training), accepted at the ICCV Workshop 2025, where I worked on VAE training and Evaluations. Currently, I am a Research Intern at IIT Roorkee under Dr. GK Nayak, developing novel methods for Indoor VPR and building synthetic datasets using Infinigen. Also looking forward to implement world models for better agent control.

My project work involves implementing SOTA architectures, including segmentation frameworks using SigLIP2, fine-tuning LLaMA 2 with QLoRA, and building PaliGemma VLM from scratch. I also have experience deploying quantized YOLOv8 models for real-time applications.

Pranay Meshram

Professional Experience

Project Intern | VITAL Lab

Indian Institute of Technology, Roorkee

Jun 2025 - Present

Roorkee, India

  • Developing novel methods for Visual Place Recognition (VPR) and image geolocalization under Dr.Gaurav Kumar Nayak
  • Literature Review of existing VPR & Geolocalization techniques and datasets.
  • Implementing baseline models for performance evaluation.

Research Intern

MuSiC Lab, IISER Bhopal

May 2025 - July 2025

IISER Bhopal, India

  • Implemented Vision Language Models for segmentation tasks.
  • Utilized Coco-stuff dataset for training the CAT-Seg(Cost Aggregation for Open-Vocabulary Semantic Segmentation) Model with Siglip-2 encoder for segmentation through Vision Language Model
  • Literature review of most recent A* conference papers in Open Vocabulary Segmentation

Technical Skills

ML/DL & AI
PyTorch TensorFlow Hugging Face Transformers LLMs LoRA Quantization
Computer Vision
OpenCV Vision Transformers Segmentation Object Detection VLMs PaliGemma
Programming & Tools
Python C++ Git Docker Linux Raspberry Pi Jupyter

Research Interests

World Models

Vision Language Models

Visual Place Recognition & Geolocalization

Diffusion Models