arxiv:2602.01660

CoDiQ: Test-Time Scaling for Controllable Difficult Question Generation

Published on Feb 2

· Submitted by

zhongyuan peng on Feb 3

Multimodal Art Projection

Upvote

Authors:

Abstract

A novel framework called CoDiQ enables controllable difficulty generation for competition-level questions through test-time scaling, resulting in a corpus that significantly improves large reasoning model performance.

AI-generated summary

Large Reasoning Models (LRMs) benefit substantially from training on challenging competition-level questions. However, existing automated question synthesis methods lack precise difficulty control, incur high computational costs, and struggle to generate competition-level questions at scale. In this paper, we propose CoDiQ (Controllable Difficult Question Generation), a novel framework enabling fine-grained difficulty control via test-time scaling while ensuring question solvability. Specifically, first, we identify a test-time scaling tendency (extended reasoning token budget boosts difficulty but reduces solvability) and the intrinsic properties defining the upper bound of a model's ability to generate valid, high-difficulty questions. Then, we develop CoDiQ-Generator from Qwen3-8B, which improves the upper bound of difficult question generation, making it particularly well-suited for challenging question construction. Building on the CoDiQ framework, we build CoDiQ-Corpus (44K competition-grade question sequences). Human evaluations show these questions are significantly more challenging than LiveCodeBench/AIME with over 82% solvability. Training LRMs on CoDiQ-Corpus substantially improves reasoning performance, verifying that scaling controlled-difficulty training questions enhances reasoning capabilities. We open-source CoDiQ-Corpus, CoDiQ-Generator, and implementations to support related research.

View arXiv page View PDF GitHub 1 Add to collection

Community

happzy2633

Paper submitter about 18 hours ago

happzy2633

Paper submitter about 18 hours ago

The bottleneck of Inference-Time Compute often lies in high-quality data and infrastructure. We hope CoDiQ can provide a scalable new approach to addressing data scarcity.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 1

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2602.01660 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.