PhoenixZ (Xiangyu Z)

authored 7 papers 3 months ago

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25, 2025 • 211

GenExam: A Multidisciplinary Text-to-Image Exam

Paper • 2509.14232 • Published Sep 17, 2025 • 21

MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization

Paper • 2510.08540 • Published Oct 9, 2025 • 109

authored a paper 9 months ago

Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing

Paper • 2504.02826 • Published Apr 3, 2025 • 68

authored 6 papers 10 months ago

Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language

Paper • 2406.20085 • Published Jun 28, 2024 • 13

An Open and Comprehensive Pipeline for Unified Object Grounding and Detection

Paper • 2401.02361 • Published Jan 4, 2024

VisualPRM: An Effective Process Reward Model for Multimodal Reasoning

Paper • 2503.10291 • Published Mar 13, 2025 • 36

Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM

Paper • 2503.14478 • Published Mar 18, 2025 • 48

Redundancy Principles for MLLMs Benchmarks

Paper • 2501.13953 • Published Jan 20, 2025 • 29

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

Paper • 2502.18411 • Published Feb 25, 2025 • 74

authored 2 papers over 1 year ago

MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning

Paper • 2406.17770 • Published Jun 25, 2024 • 19

MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding

Paper • 2406.14515 • Published Jun 20, 2024 • 33

Xiangyu Z

AI & ML interests

Organizations

MM-IFEngine: Towards Multimodal Instruction Following

MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents

GOBench: Benchmarking Geometric Optics Generation and Understanding of MLLMs

Intern-S1: A Scientific Multimodal Foundation Model

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

GenExam: A Multidisciplinary Text-to-Image Exam

MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization

Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing

Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language

An Open and Comprehensive Pipeline for Unified Object Grounding and Detection

VisualPRM: An Effective Process Reward Model for Multimodal Reasoning

Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM

Redundancy Principles for MLLMs Benchmarks

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning

MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding

Xiangyu Z

AI & ML interests

Organizations

PhoenixZ's activity