LexiMind β€” Multi-Task Transformer Model

LexiMind is a custom-built multi-task encoder-decoder Transformer that jointly performs abstractive summarization, emotion detection (multi-label, 28 classes), and topic classification (7 classes). It uses a FLAN-T5-base initialization with several architectural enhancements.

Architecture

Component Detail
Base FLAN-T5-base (272M parameters)
Encoder 12 layers, 768 hidden dim, 12 heads
Decoder 12 layers, 768 hidden dim, 12 heads
FFN Gated-GELU, d_ff = 2048
Position Relative position bias (T5 style)
Vocab 32 128 tokens (SentencePiece)
Summarization head Decoder β†’ linear projection β†’ vocab
Emotion head Attention-pooled encoder β†’ 28-class sigmoid
Topic head [CLS]-pooled encoder β†’ 7-class softmax
Task sampling Temperature-based (Ο„ = 2.0) with proportional mixing

Training

  • Data: CNN/DailyMail + BookSum (summarization), GoEmotions (emotion), AG News (topic)
  • Epochs: 8 (~9 hours on a single NVIDIA RTX 4070)
  • Optimizer: AdamW, lr = 3e-4, weight decay = 0.01
  • Scheduler: Linear warmup (500 steps) + cosine decay
  • Gradient clipping: max norm = 1.0
  • Mixed precision: FP16 via PyTorch AMP

Evaluation Results

Task Metric Value
Summarization ROUGE-1 0.309
Summarization ROUGE-L 0.185
Summarization BLEU-4 0.024
Topic Classification Accuracy 85.7%
Topic Classification Macro F1 0.854
Emotion Detection Sample-Avg F1 0.352
Emotion Detection Micro F1 0.443

Files

File Description
best.pt Full model checkpoint (state dict + optimizer + metadata)
labels.json Emotion (28) and topic (7) label mappings
tokenizer.json SentencePiece tokenizer (flat format)
hf_tokenizer/ HuggingFace-compatible tokenizer directory

Usage

import torch
from src.models.factory import build_model
from src.utils.io import load_labels

labels = load_labels("labels.json")
model = build_model(config, labels)

ckpt = torch.load("best.pt", map_location="cpu")
model.load_state_dict(ckpt["model_state_dict"])
model.eval()

See the full codebase at github.com/OliverPerrin/LexiMind for inference scripts, API server, and Gradio demo.

License

MIT

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Datasets used to train OliverPerrin/LexiMind-Model

Space using OliverPerrin/LexiMind-Model 1

Evaluation results