Complexity Base
A Llama-style transformer with architectural improvements for efficiency and performance.
Architecture: Llama + Improvements
Complexity builds on the Llama architecture with three key enhancements:
| Component | Llama | Complexity |
|---|---|---|
| MLP | Dense FFN | Token-Routed MLP (4 experts, 1 active) |
| Attention | Standard | Flash Attention via SDPA |
| Normalization | RMSNorm only | RMSNorm + QK Normalization |
Token-Routed MLP
Unlike MoE which routes based on hidden states, Token-Routed MLP routes based on token ID:
expert_idx = token_id % num_experts # Deterministic routing
output = experts[expert_idx](hidden_states)
Benefits:
- No router network overhead
- Deterministic, reproducible routing
- 4x parameter efficiency (only 1/4 experts active)
QK Normalization
Stabilizes attention at scale by normalizing Q and K before computing attention scores:
q = self.q_norm(q)
k = self.k_norm(k)
attn = (q @ k.T) / sqrt(d)
Model Details
- Parameters: ~100M
- Hidden size: 768
- Layers: 12
- Attention heads: 12 (KV heads: 4)
- Experts: 4 (1 active per token)
- Vocabulary: 100K tokens
- Context: 2048 tokens
- Training steps: 10,000
Installation
pip install complexity-model pyllm-inference
Usage
With PyLLM
pyllm serve Pacific-Prime/complexity-tiny
Python API
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("Pacific-Prime/complexity")
model = AutoModelForCausalLM.from_pretrained(
"Pacific-Prime/complexity",
trust_remote_code=True
)
inputs = tokenizer("def fibonacci(n):", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))
Comparison with Llama
Llama: embed -> [Attn + FFN] x L -> output
Complexity: embed -> [Attn + TokenRoutedMLP] x L -> output
โ QK Norm โ 4 experts (1 active)
Same parameter count, but:
- 4x more total MLP parameters (distributed across experts)
- Faster training (QK norm stabilizes gradients)
- Better scaling (sparse activation)
License
Apache 2.0
Citation
@misc{complexity,
title={Complexity: Token-Routed MLP Transformer},
author={Pacific Prime},
year={2025},
url={https://huggingface.co/Pacific-Prime/complexity}
}
- Downloads last month
- 14