Geilim-1B-Instruct (忌廉)

Deep Causal Internal Reasoning No verbose CoT, no <think> tags, just concise answers powered by implicit reasoning.


💡 Introduction

Recent advances in reasoning models (DeepSeek R1, o1) have demonstrated impressive capabilities through Chain-of-Thought (CoT) reasoning. However, we observe several critical drawbacks:

Problems with External CoT:

  1. Verbosity Tax: Models generate hundreds of tokens in <think> tags before answering, increasing latency and cost
  2. Autoregressive Dependency: Models must "see" their reasoning to follow it, forcing sequential token generation
  3. Token Inefficiency: Users pay for reasoning traces they often don't need, only the final answer matters
  4. Production Overhead: Verbose outputs are impractical for real-time APIs and edge deployment

Our Insight: What if reasoning could happen internally in the model's hidden states, without generating verbose traces?

Geilim-1B-Instruct addresses these limitations through a hybrid architecture combining:

  • ASPP (Adjacency-Structured Parallel Propagation): Graph-based causal chains for structured reasoning
  • π-flow (Probability Flow Dynamics): Internal refinement in probability space without token generation
  • Hybrid Gating: Learnable balance between structured and attention-based processing

The result: Deep reasoning capability with concise outputs - the best of both worlds.


🎯 Core Value Proposition

Geilim-1B-Instruct is the anti-verbose reasoning model.

Model Type Reasoning Approach Output Style
Baseline (Llama-3.2-1B) Limited reasoning Direct but may lack depth
CoT Models (DeepSeek R1, o1) External reasoning chains Verbose <think> tags, long outputs
Geilim-1B-Instruct Internal reasoning Concise answers, reasoning in hidden states

Key Differentiator: Geilim performs deep causal reasoning internally through ASPP+π-flow architecture, then outputs only the final answer. You get the reasoning quality without the verbosity tax.


🏗️ Architecture Overview

Geilim-1B-Instruct combines three key components for implicit reasoning:

1. ASPP Operator (Adjacency-Structured Parallel Propagation)

  • Union-Find graph structure: Linear causal chain where each token only connects to its parent
  • Iterative message passing: h_i^(t+1) = φ(h_i^(t), h_parent[i])
  • K-step evolution: Adaptive 2-8 steps of causal propagation
  • Complexity: O(n) - efficient linear-time reasoning

Why it matters: ASPP creates explicit causal relationships between tokens, allowing information to flow through a reasoning chain without generating output tokens.

2. π-flow (Probability Flow Dynamics)

  • Velocity field learning: h' = h + α * v(h) where v(h) is a learned refinement
  • Multi-step refinement: Iterates in probability space to converge on the correct answer
  • Gated application: Model learns when to refine (complex questions) vs when to skip (simple questions)
  • Internal convergence: Reasoning happens in hidden states, not in generated text

Why it matters: π-flow eliminates the need for external CoT by performing iterative refinement internally. The model "thinks" in its hidden states and outputs only the final result.

3. Hybrid Gating Mechanism

output = gate * ASPP(x) + (1-gate) * Attention(x)
  • Combines structured causal reasoning (ASPP) with flexible attention
  • Learnable balance between graph-based and sequence-based processing
  • Applied to all 30 layers of the base model (Llama-3.2-1B)

🧠 Why π-flow Eliminates Verbosity

The Problem with Traditional CoT

External Reasoning Models (DeepSeek R1, o1-style):

User: What is 15 * 8?

Model: <think>
Let me break this down step by step:
1. First, I'll multiply 15 by 8
2. 15 * 8 = 15 * (10 - 2)
3. Using distributive property: 15*10 - 15*2
4. 150 - 30 = 120
Therefore, the answer is 120.
</think>

The answer is 120.
  • Output: 250+ characters
  • Latency: High (many tokens to generate)
  • Cost: Expensive (charged per token)

Geilim's Internal Reasoning

Geilim-1B-Instruct (ASPP+π-flow):

User: What is 15 * 8?

Model: 120
  • Output: 3 characters
  • Latency: Low (minimal generation)
  • Cost: Minimal
  • Reasoning: Happened internally through:
    1. ASPP causal chain propagating arithmetic relationships
    2. π-flow refining probability distribution across answer space
    3. Convergence to correct answer in hidden states

🔬 Technical Mechanism

How π-flow Achieves Internal Reasoning

  1. Probability Space Operations

    • Instead of generating tokens to explore answers, π-flow refines probability distributions directly
    • v(h): Learned velocity field that corrects the model's initial judgment
    • Multi-step: h^(0) → h^(1) → h^(2) (2 refinement steps)
  2. Convergence Without Output

    • Traditional models need to "see" their reasoning to follow it (autoregressive dependency)
    • π-flow breaks this: reasoning occurs in parallel across all positions simultaneously
    • The model converges internally before generating any output token
  3. Adaptive Complexity

    • pi_flow_use_gate=True: Model learns when refinement is needed
    • Simple questions: Direct output (gate ≈ 0, skip refinement)
    • Complex questions: Internal multi-step refinement (gate ≈ 1, apply π-flow)
    • User always sees concise output regardless
  4. Synergy with ASPP

    • ASPP provides causal structure (parent-child dependencies)
    • π-flow refines along these dependencies
    • Result: Structured reasoning (not just attention) + probabilistic convergence = deep causal understanding

📊 Configuration

Model Architecture

  • Base Model: Llama-3.2-1B-Instruct (1.26B params)
  • Total Parameters: ~1.4B (140M additional ASPP+π-flow params)
  • Hybrid Layers: All 30 layers (universal reasoning capability)

ASPP Settings

aspp_hidden_dim: 512         # vs 2048 model hidden_size (reduce overfitting)
aspp_num_steps: 2-8          # learnable via sigmoid gating
aspp_dropout: 0.15
aspp_num_neighbors: 1        # Union-Find: parent-only connections

π-flow Settings

pi_flow: True                # Enable probability flow refinement
pi_flow_steps: 2             # 2-step refinement
pi_flow_scale: 0.5           # Moderate refinement strength
pi_flow_use_gate: True       # Adaptive gating

🚀 Quick Start

Installation

pip install transformers torch

Basic Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model
model_path = "NoesisLab/Geilim-1B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

# Generate response
prompt = "A store has 120 apples. They sell 35 in the morning and 48 in the afternoon. How many are left?"
messages = [{"role": "user", "content": prompt}]

input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=128,
    temperature=0.7,
    do_sample=True,
    top_p=0.9,
)

response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
print(response)  # Expected: "37" or "37 apples are left." (concise!)

Advanced Usage

# For math problems requiring step-by-step (if needed)
# Note: Geilim prefers concise outputs, but can show work if prompted
prompt = "Explain how you would solve: What is 15 * 23?"

# For best results with implicit reasoning
generation_config = {
    "max_new_tokens": 128,        # Keep low to encourage conciseness
    "temperature": 0.7,           # Moderate sampling
    "do_sample": True,
    "top_p": 0.9,
    "repetition_penalty": 1.1,    # Prevent loops
}

🎓 Training Details

Dataset

  • Mixed-Benchmark-Dataset (composite reasoning benchmarks)
    • 25% GSM8K (math reasoning)
    • 30% HellaSwag (commonsense)
    • 20% ARC (science QA)
    • 10% OpenHermes (high-quality responses)
    • 15% Capybara (multi-turn conversations)

Training Configuration

  • Framework: TRL SFTTrainer with packing
  • Epochs: 2
  • Batch Size: Effective 8 (per_device=2, grad_accum=4)
  • Learning Rate: 2e-4 with 10% warmup
  • Precision: bfloat16 with gradient checkpointing
  • Optimizer: AdamW (weight_decay=0.1, max_grad_norm=1.0)

Training Philosophy

Unlike CoT models trained on verbose reasoning chains, Geilim is trained on answer-focused data where:

  • Correct answers are rewarded
  • Reasoning quality is learned implicitly through ASPP+π-flow gradients
  • The model learns to converge internally rather than generate external reasoning

📈 Evaluation

Reasoning Quality Tests

Geilim is evaluated on:

  1. Math reasoning (GSM8K-style arithmetic)
  2. Commonsense reasoning (HellaSwag, PIQA)
  3. Logic puzzles (multi-hop deduction)
  4. Reading comprehension (information tracking)
  5. Causal reasoning (cause-effect relationships)

Key Metrics

  • Answer correctness (primary goal)
  • Response conciseness (< 150 chars = concise)
  • Reasoning traces (should be absent from output, present in hidden states)

🎯 Use Cases

Ideal For:

  • Production APIs: Low latency, low token cost
  • Real-time applications: Minimal generation overhead
  • Cost-sensitive deployments: Pay only for the answer, not the reasoning
  • User-facing chat: Clean outputs without technical reasoning traces
  • Mobile/edge devices: Smaller token budgets

Not Ideal For:

  • Educational use cases: When you want to show reasoning steps to users
  • Debugging/verification: When explicit reasoning helps validate answers
  • Research: When analyzing reasoning chains is the goal

🆚 Comparison Table

Feature Geilim-1B-Instruct DeepSeek R1 Llama-3.2-1B
Model Size 1.4B 1.5B 1.26B
Reasoning Type Internal (ASPP+π-flow) External (CoT) Limited
Output Style Concise answers Verbose <think> tags Direct answers
Latency Low High (many tokens) Low
Cost per query Low High Low
Reasoning depth Deep (hidden states) Deep (explicit) Shallow
Token efficiency High Low Medium

📚 Technical References

Core Papers & Concepts

  • Union-Find Data Structure: Parent-only connections for efficient causal propagation
  • Probability Flow ODEs: Continuous refinement in probability space (inspired by diffusion models)
  • Hybrid Architectures: Combining structured (graph) and unstructured (attention) reasoning

Related Work

  • DeepSeek R1: External reasoning chains
  • o1 series: Long-form CoT reasoning
  • SmolLM2: Efficient small language models
  • Graph Neural Networks: Structured message passing

🔧 Development

Custom Model Registration

  • Model type: asterisk (registered with HuggingFace AutoModel)
  • Config class: AsteriskConfig (extends LlamaConfig)
  • Model class: AsteriskForCausalLM (extends LlamaForCausalLM)
  • Loading: Requires trust_remote_code=True

🌟 Key Takeaways

  1. No verbose CoT: Geilim performs reasoning internally, outputs concisely
  2. ASPP+π-flow: Causal graph structure + probability flow refinement
  3. Deep causal understanding: Reasoning happens in hidden states, not generated text
  4. Production-ready: Low latency, low cost, clean outputs
  5. Same reasoning depth: Matches CoT models without the verbosity

📝 Citation

If you use Geilim-1B-Instruct in your research or applications, please cite:

@misc{geilim2026,
  title={Geilim-1B-Instruct: Deep Causal Internal Reasoning via ASPP and Probability Flow},
  author={NoesisLab},
  year={2026},
  howpublished={HuggingFace Model Hub},
  url={https://huggingface.co/NoesisLab/Geilim-1B-Instruct}
}

🤝 Acknowledgments

  • Base Model: Llama-3.2-1B-Instruct by Meta
  • Training Framework: TRL by HuggingFace
  • Inspiration: DeepSeek R1 (for demonstrating value of reasoning), but pursuing conciseness

📄 License

Llama 3.2 Community License


🔗 Links


Built with ❤️ for the era of efficient reasoning models.

Geilim (忌廉) - Cantonese for "cream" - smooth, concise, and rich in substance.

Downloads last month
4
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NoesisLab/Geilim-1B-Instruct

Finetuned
(1255)
this model

Datasets used to train NoesisLab/Geilim-1B-Instruct

Collection including NoesisLab/Geilim-1B-Instruct