Complexity Base

A Llama-style transformer with architectural improvements for efficiency and performance.

Architecture: Llama + Improvements

Complexity builds on the Llama architecture with three key enhancements:

Component Llama Complexity
MLP Dense FFN Token-Routed MLP (4 experts, 1 active)
Attention Standard Flash Attention via SDPA
Normalization RMSNorm only RMSNorm + QK Normalization

Token-Routed MLP

Unlike MoE which routes based on hidden states, Token-Routed MLP routes based on token ID:

expert_idx = token_id % num_experts  # Deterministic routing
output = experts[expert_idx](hidden_states)

Benefits:

  • No router network overhead
  • Deterministic, reproducible routing
  • 4x parameter efficiency (only 1/4 experts active)

QK Normalization

Stabilizes attention at scale by normalizing Q and K before computing attention scores:

q = self.q_norm(q)
k = self.k_norm(k)
attn = (q @ k.T) / sqrt(d)

Model Details

  • Parameters: ~100M
  • Hidden size: 768
  • Layers: 12
  • Attention heads: 12 (KV heads: 4)
  • Experts: 4 (1 active per token)
  • Vocabulary: 100K tokens
  • Context: 2048 tokens
  • Training steps: 10,000

Installation

pip install complexity-model pyllm-inference

Usage

With PyLLM

pyllm serve Pacific-Prime/complexity-tiny

Python API

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Pacific-Prime/complexity")
model = AutoModelForCausalLM.from_pretrained(
    "Pacific-Prime/complexity",
    trust_remote_code=True
)

inputs = tokenizer("def fibonacci(n):", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))

Comparison with Llama

Llama:      embed -> [Attn + FFN] x L -> output
Complexity: embed -> [Attn + TokenRoutedMLP] x L -> output
                      โ†‘ QK Norm    โ†‘ 4 experts (1 active)

Same parameter count, but:

  • 4x more total MLP parameters (distributed across experts)
  • Faster training (QK norm stabilizes gradients)
  • Better scaling (sparse activation)

License

Apache 2.0

Citation

@misc{complexity,
  title={Complexity: Token-Routed MLP Transformer},
  author={Pacific Prime},
  year={2025},
  url={https://huggingface.co/Pacific-Prime/complexity}
}
Downloads last month
14
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support