Perceptual Taxonomy: Evaluating and Guiding Hierarchical Scene Reasoning in Vision-Language Models Paper • 2511.19526 • Published Nov 24, 2025 • 1
Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering Paper • 2406.00622 • Published Jun 2, 2024
3D-Aware Visual Question Answering about Parts, Poses and Occlusions Paper • 2310.17914 • Published Oct 27, 2023
Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning Paper • 2212.00259 • Published Dec 1, 2022
PulseCheck457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Multimodal Models Paper • 2502.08636 • Published Feb 12, 2025
SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning Paper • 2504.20024 • Published Apr 28, 2025
XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models Paper • 2510.15148 • Published Oct 16, 2025 • 2
KeyVID: Keyframe-Aware Video Diffusion for Audio-Synchronized Visual Animation Paper • 2504.09656 • Published Apr 13, 2025
Hard Examples Are All You Need: Maximizing GRPO Post-Training Under Annotation Budgets Paper • 2508.14094 • Published Aug 15, 2025 • 1
Rapidly Adapting to New Voice Spoofing: Few-Shot Detection of Synthesized Speech Under Distribution Shifts Paper • 2508.13320 • Published Aug 18, 2025 • 2
Kreyòl-MT: Building MT for Latin American, Caribbean and Colonial African Creole Languages Paper • 2405.05376 • Published May 8, 2024
Feedback Friction: LLMs Struggle to Fully Incorporate External Feedback Paper • 2506.11930 • Published Jun 13, 2025 • 53
SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer Paper • 2409.08425 • Published Sep 12, 2024 • 10
CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech Paper • 2506.02863 • Published Jun 3, 2025 • 8
CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech Paper • 2506.02863 • Published Jun 3, 2025 • 8
SoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative Pipeline Paper • 2505.19314 • Published May 25, 2025 • 4
Vox-Profile: A Speech Foundation Model Benchmark for Characterizing Diverse Speaker and Speech Traits Paper • 2505.14648 • Published May 20, 2025 • 9
Noise-robust Speech Separation with Fast Generative Correction Paper • 2406.07461 • Published Jun 11, 2024