MEG-XL: Data-Efficient Brain-to-Text via Long-Context Pre-Training
Paper
•
2602.02494
•
Published
•
1
MEG-XL is a brain-to-text foundation model pre-trained with 2.5 minutes of MEG context per sample (equivalent to 191k tokens). It is designed to capture extended neural context, enabling high data efficiency for decoding words from brain activity.
Instructions for environment setup, tokenizer (BioCodec) requirements, and data preparation are available in the official GitHub repository.
You can fine-tune or evaluate the model on word decoding tasks using the following command structure:
python -m brainstorm.evaluate_criss_cross_word_classification \
--config-name=eval_criss_cross_word_classification_{armeni, gwilliams, libribrain} \
model.criss_cross_checkpoint=/path/to/your/checkpoint.ckpt
To perform linear probing, use:
python -m brainstorm.evaluate_criss_cross_word_classification \
--config-name=eval_criss_cross_word_classification_linear_probe_{armeni, gwilliams, libribrain} \
model.criss_cross_checkpoint=/path/to/your/checkpoint.ckpt
If you find this work helpful in your research, please cite:
@article{jayalath2026megxl,
title={{MEG-XL}: Data-Efficient Brain-to-Text via Long-Context Pre-Training},
author={Jayalath, Dulhan and Parker Jones, Oiwi},
journal={arXiv preprint arXiv:2602.02494},
year={2026}
}