Complexity Base

A Llama-style transformer with architectural improvements for efficiency and performance.

Architecture: Llama + Improvements

Complexity builds on the Llama architecture with three key enhancements:

Component	Llama	Complexity
MLP	Dense FFN	Token-Routed MLP (4 experts, 1 active)
Attention	Standard	Flash Attention via SDPA
Normalization	RMSNorm only	RMSNorm + QK Normalization

Token-Routed MLP

Unlike MoE which routes based on hidden states, Token-Routed MLP routes based on token ID:

expert_idx = token_id % num_experts  # Deterministic routing
output = experts[expert_idx](hidden_states)

Benefits:

No router network overhead
Deterministic, reproducible routing
4x parameter efficiency (only 1/4 experts active)

QK Normalization

Stabilizes attention at scale by normalizing Q and K before computing attention scores:

q = self.q_norm(q)
k = self.k_norm(k)
attn = (q @ k.T) / sqrt(d)

Model Details

Parameters: ~100M
Hidden size: 768
Layers: 12
Attention heads: 12 (KV heads: 4)
Experts: 4 (1 active per token)
Vocabulary: 100K tokens
Context: 2048 tokens
Training steps: 10,000

Installation

pip install complexity-model pyllm-inference

Usage

With PyLLM

pyllm serve Pacific-Prime/complexity-tiny

Python API

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Pacific-Prime/complexity")
model = AutoModelForCausalLM.from_pretrained(
    "Pacific-Prime/complexity",
    trust_remote_code=True
)

inputs = tokenizer("def fibonacci(n):", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))

Comparison with Llama

Llama:      embed -> [Attn + FFN] x L -> output
Complexity: embed -> [Attn + TokenRoutedMLP] x L -> output
                      ↑ QK Norm    ↑ 4 experts (1 active)

Same parameter count, but:

4x more total MLP parameters (distributed across experts)
Faster training (QK norm stabilizes gradients)
Better scaling (sparse activation)

License

Apache 2.0

Citation

@misc{complexity,
  title={Complexity: Token-Routed MLP Transformer},
  author={Pacific Prime},
  year={2025},
  url={https://huggingface.co/Pacific-Prime/complexity}
}

Downloads last month: 14