Mamba Tamil Transliteration
A Mamba-based sequence-to-sequence model for English-to-Tamil transliteration.
Model Details
- Architecture: Mamba encoder-decoder with cross-attention
- Parameters: 6.5M
- Training data: Dakshina Tamil lexicon (train split)
- Tokenization: Grapheme-level
Performance
| Metric | Dakshina Test |
|---|---|
| Top-1 Accuracy | 68.88% |
| Top-1 + Rerank (α=0.6) | 80.84% |
| Recall@10 | 88.71% |
Beats IndicXlit (78.53%) on Dakshina Tamil with unigram LM reranking.
Usage
import torch
import json
from model import MambaTranslit # see model.py
# load
with open("config.json") as f:
config = json.load(f)
with open("vocab.json") as f:
vocab = json.load(f)
model = MambaTranslit(
len(vocab["src_vocab"]), len(vocab["tgt_vocab"]),
config["d_model"], config["d_state"], config["d_conv"], config["expand"],
config["num_encoder_layers"], config["num_decoder_layers"], 0.0
)
state = torch.load("best_model.pt", map_location="cpu")
model.load_state_dict(state["model"])
model.eval()
# inference
def transliterate(word):
src_vocab = vocab["src_vocab"]
tgt_inv = {v: k for k, v in vocab["tgt_vocab"].items()}
enc = [1] + [src_vocab.get(c, 3) for c in word] + [2] # SOS=1, EOS=2, UNK=3
src = torch.tensor([enc])
with torch.no_grad():
out = model.generate_greedy(src, max_len=64)
result = []
for t in out[0].tolist():
if t == 2: break # EOS
if t not in (0, 1): result.append(tgt_inv.get(t, ""))
return "".join(result)
print(transliterate("tamil")) # தமிழ்
Architecture
d_model: 256
d_state: 16
d_conv: 4
expand: 2
num_encoder_layers: 8
num_decoder_layers: 4
Limitations
- Requires
mamba_ssmpackage (CUDA only) - Single-word transliteration only
- Tamil script output only
Citation
@misc{mamba-tamil-xlit,
title={Mamba Tamil Transliteration},
year={2024},
url={https://huggingface.co/cloudrumbles/mamba-tamil-xlit}
}
- Downloads last month
- 12
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support