Waseem AlShikh's picture

Waseem AlShikh

wassemgtk

·

https://writer.com/

AI & ML interests

Multi-modal, Palmyra LLMs, Knowledge Graph

Organizations

Posts 5

Post

3250

I’ve been diving into the iRoPE architecture from Llama 4—a game-changer for long-context models! It interleaves local attention (with RoPE) for short contexts and global attention (with inference-time temp scaling) for long-range reasoning, aiming for infinite context. I’m going to try writing iRoPE—who wants to help?

Code: https://github.com/wassemgtk/iRoPE-try/blob/main/iRoPE.ipynb

Articles 1

Article

Leveraging Hugging Face for complex generative AI use cases

View all Articles

Papers 6

arxiv:2505.24726

arxiv:2502.06329

arxiv:2408.14906

arxiv:2405.02048

models 10

wassemgtk/pruned-llama-9u0elslx

0.5B • Updated Aug 28, 2025 • 6

wassemgtk/mergekit-ties-isswcgh

Text Generation • 15B • Updated Apr 9, 2025 • 10

wassemgtk/pruned-llama-u42y1jwh

0.5B • Updated Mar 27, 2025 • 6

wassemgtk/mergekit-passthrough-dmvdobt

Text Generation • 81B • Updated Mar 25, 2025 • 1

wassemgtk/mergekit-linear-tdzebun

Text Generation • 156B • Updated Jun 16, 2024 • 2

wassemgtk/very-creative1

Text Generation • 120B • Updated May 15, 2024 • 6

wassemgtk/merge-Meta-Llama-3-8B-Instruct-Nous-Hermes-2-Yi-34B

Text Generation • 27B • Updated May 6, 2024 • 7

wassemgtk/merge-Nous-Hermes-2-Yi-34B-Llama-3-8B-Instruct-12B

Text Generation • 12B • Updated May 6, 2024 • 5

wassemgtk/merge-passthrough-Meta-Llama-3-Instruct-10B

Text Generation • 10B • Updated May 6, 2024 • 8

wassemgtk/merge-NeuralHermes-2.5-Mistral-7B-MetaMath

Text Generation • 9B • Updated May 6, 2024 • 6

datasets 1

wassemgtk/CoT-O1-examples

Viewer • Updated Sep 16, 2024 • 4 • 53