Submitted by Zirui Wang 37 VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents Sky Computing Lab 68 3