World Foundation Models
Augmenting video data and synthesizing trajectories for sample-efficient policy learning.
I'm Rugved Katole — PhD candidate at The Ohio State University researching vision-language-action systems and world foundation models for general-purpose robotics. Focused on closing the data gap between simulation and reality.
Trained as a mechanical engineer at BITS Pilani and now a PhD candidate in Computer Science at The Ohio State University, I work at the intersection of foundation models and embodied AI. Previously at IIT Bombay's TIH and ARMS Lab, leading deployments in autonomous navigation and multi-agent planning.
I care about systems that work outside the lab — robots that handle 40° slips on vineyard slopes, UAV swarms that map fields without choreography, and policies trained from scarce, real-world data.
Augmenting video data and synthesizing trajectories for sample-efficient policy learning.
Closing the loop between perception, language, and embodied control across manipulation tasks.
Photoreal Omniverse digital twins, ROS 2 deployments, and field-tested multi-agent systems.
Six projects spanning foundation models, multi-agent autonomy, and embodied AI. Each connects to a deployed system, a published paper, or both.
Augment real-world video datasets for vision-language-action training. Synthesize counterfactual trajectories to make robotic policies sample-efficient.
Diffusion-based filtering and early-exit pipelines that detect inauthentic synthetic videos 9× faster — 75% compute saved on generation.
Photoreal NVIDIA Omniverse simulation of generative animal behaviors and herd dynamics. Drone algorithms validated before field deployment.
Communication-free, deadlock-free intersection coordination using graph theory and road-marking intent detection across 255 scenarios.
CNN + multi-agent RL for heterogeneous UAV crop scouting. Cuts scouting need 60%, labor cost 4.8×, lifts farmer profit 36%.
Distributed online patrol with finite-time visit guarantees, balancing priority and non-priority site coverage. Sim-to-real validated.
Selected publications across world models, edge computing for conservation, and multi-agent autonomy. Google Scholar →
I'm open to industry research roles, collaborations on world models & VLA, speaking, and consulting. Best reached over email or LinkedIn.