LexiMind β Multi-Task Transformer Model
LexiMind is a custom-built multi-task encoder-decoder Transformer that jointly performs abstractive summarization, emotion detection (multi-label, 28 classes), and topic classification (7 classes). It uses a FLAN-T5-base initialization with several architectural enhancements.
Architecture
| Component | Detail |
|---|---|
| Base | FLAN-T5-base (272M parameters) |
| Encoder | 12 layers, 768 hidden dim, 12 heads |
| Decoder | 12 layers, 768 hidden dim, 12 heads |
| FFN | Gated-GELU, d_ff = 2048 |
| Position | Relative position bias (T5 style) |
| Vocab | 32 128 tokens (SentencePiece) |
| Summarization head | Decoder β linear projection β vocab |
| Emotion head | Attention-pooled encoder β 28-class sigmoid |
| Topic head | [CLS]-pooled encoder β 7-class softmax |
| Task sampling | Temperature-based (Ο = 2.0) with proportional mixing |
Training
- Data: CNN/DailyMail + BookSum (summarization), GoEmotions (emotion), AG News (topic)
- Epochs: 8 (~9 hours on a single NVIDIA RTX 4070)
- Optimizer: AdamW, lr = 3e-4, weight decay = 0.01
- Scheduler: Linear warmup (500 steps) + cosine decay
- Gradient clipping: max norm = 1.0
- Mixed precision: FP16 via PyTorch AMP
Evaluation Results
| Task | Metric | Value |
|---|---|---|
| Summarization | ROUGE-1 | 0.309 |
| Summarization | ROUGE-L | 0.185 |
| Summarization | BLEU-4 | 0.024 |
| Topic Classification | Accuracy | 85.7% |
| Topic Classification | Macro F1 | 0.854 |
| Emotion Detection | Sample-Avg F1 | 0.352 |
| Emotion Detection | Micro F1 | 0.443 |
Files
| File | Description |
|---|---|
best.pt |
Full model checkpoint (state dict + optimizer + metadata) |
labels.json |
Emotion (28) and topic (7) label mappings |
tokenizer.json |
SentencePiece tokenizer (flat format) |
hf_tokenizer/ |
HuggingFace-compatible tokenizer directory |
Usage
import torch
from src.models.factory import build_model
from src.utils.io import load_labels
labels = load_labels("labels.json")
model = build_model(config, labels)
ckpt = torch.load("best.pt", map_location="cpu")
model.load_state_dict(ckpt["model_state_dict"])
model.eval()
See the full codebase at github.com/OliverPerrin/LexiMind for inference scripts, API server, and Gradio demo.
License
MIT
Datasets used to train OliverPerrin/LexiMind-Model
Space using OliverPerrin/LexiMind-Model 1
Evaluation results
- rouge1self-reported0.309
- rougeLself-reported0.185
- bleu4self-reported0.024
- accuracyself-reported0.857
- f1self-reported0.854
- f1self-reported0.352