7.4
TFLOPS
66
followers
·
110 following
AI & ML interests
Multi-modal, Palmyra LLMs, Knowledge Graph
Organizations
view post
I’ve been diving into the iRoPE architecture from Llama 4—a game-changer for long-context models! It interleaves local attention (with RoPE) for short contexts and global attention (with inference-time temp scaling) for long-range reasoning, aiming for infinite context. I’m going to try writing iRoPE—who wants to help? Code: https://github.com/wassemgtk/iRoPE-try/blob/main/iRoPE.ipynb
See translation
Leveraging Hugging Face for complex generative AI use cases
models
10
wassemgtk/pruned-llama-9u0elslx
0.5B
•
Updated
Aug 28, 2025
•
6
wassemgtk/mergekit-ties-isswcgh
Text Generation
•
15B
•
Updated
Apr 9, 2025
•
10
wassemgtk/pruned-llama-u42y1jwh
0.5B
•
Updated
Mar 27, 2025
•
6
wassemgtk/mergekit-passthrough-dmvdobt
Text Generation
•
81B
•
Updated
Mar 25, 2025
•
1
wassemgtk/mergekit-linear-tdzebun
Text Generation
•
156B
•
Updated
Jun 16, 2024
•
2
Text Generation
•
120B
•
Updated
May 15, 2024
•
6
wassemgtk/merge-Meta-Llama-3-8B-Instruct-Nous-Hermes-2-Yi-34B
Text Generation
•
27B
•
Updated
May 6, 2024
•
7
wassemgtk/merge-Nous-Hermes-2-Yi-34B-Llama-3-8B-Instruct-12B
Text Generation
•
12B
•
Updated
May 6, 2024
•
5
wassemgtk/merge-passthrough-Meta-Llama-3-Instruct-10B
Text Generation
•
10B
•
Updated
May 6, 2024
•
8
wassemgtk/merge-NeuralHermes-2.5-Mistral-7B-MetaMath
Text Generation
•
9B
•
Updated
May 6, 2024
•
6