Papers
arxiv:2602.03709

No Shortcuts to Culture: Indonesian Multi-hop Question Answering for Complex Cultural Understanding

Published on Feb 3
· Submitted by
Xingwei Tan
on Feb 4
Authors:
,
,

Abstract

Multi-hop question answering dataset ID-MoCQA assesses cultural understanding in large language models through Indonesian traditions with diverse reasoning chains.

AI-generated summary

Understanding culture requires reasoning across context, tradition, and implicit social knowledge, far beyond recalling isolated facts. Yet most culturally focused question answering (QA) benchmarks rely on single-hop questions, which may allow models to exploit shallow cues rather than demonstrate genuine cultural reasoning. In this work, we introduce ID-MoCQA, the first large-scale multi-hop QA dataset for assessing the cultural understanding of large language models (LLMs), grounded in Indonesian traditions and available in both English and Indonesian. We present a new framework that systematically transforms single-hop cultural questions into multi-hop reasoning chains spanning six clue types (e.g., commonsense, temporal, geographical). Our multi-stage validation pipeline, combining expert review and LLM-as-a-judge filtering, ensures high-quality question-answer pairs. Our evaluation across state-of-the-art models reveals substantial gaps in cultural reasoning, particularly in tasks requiring nuanced inference. ID-MoCQA provides a challenging and essential benchmark for advancing the cultural competency of LLMs.

Community

Paper author Paper submitter

To move beyond simple fact-recalling, researchers have introduced ID-MoCQA, the first large-scale multi-hop reasoning dataset focused on Indonesian culture.

  • The Problem: Most AI benchmarks use "single-hop" questions that models can answer using surface-level patterns rather than true cultural understanding.
  • The Solution: ID-MoCQA uses a framework to turn simple facts into complex reasoning chains across six categories (like geography and tradition) in both English and Indonesian.
  • The Finding: Current LLMs struggle significantly with these complex cultural inferences, highlighting a major gap in their "cultural intelligence."

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2602.03709 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2602.03709 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.