JuliaGPT-v2
A ~10M parameter character-level GPT trained on classical philosophy texts. Scaled-up successor to the original JuliaGPT (8K params), using the same 38-character vocabulary but with a much larger architecture.
Model Lineage
| Model | Params | Architecture | Vocab | Val Loss |
|---|---|---|---|---|
| MicroJulia | 4,992 | 1L/16d/4H, block=64 | 27 chars | 2.43 |
| JuliaGPT | 8,096 | 1L/16d/4H, block=256 | 29 chars | 2.34 |
| JuliaGPT-v2 | ~10M | 6L/384d/6H, block=256 | 38 chars | 2.91 |
Architecture
GPT (GPT-2 style, scaled)
+-- wte: Embedding(38 -> 384)
+-- wpe: Embedding(256 -> 384) [learned position embeddings]
+-- blocks x 6:
| +-- attn: CausalSelfAttention
| | +-- wq: Dense(384 -> 384) [6 heads, 64 dim each]
| | +-- wk: Dense(384 -> 384)
| | +-- wv: Dense(384 -> 384)
| | +-- wo: Dense(384 -> 384)
| +-- ffwd: FeedForward
| +-- Dense(384 -> 1536)
| +-- Dense(1536 -> 384)
+-- lm_head: Dense(384 -> 38)
Model Details
| Parameter | Value |
|---|---|
| Architecture | GPT-2 style Transformer |
| Parameters | ~10M |
| Embedding dim | 384 |
| Layers | 6 |
| Attention heads | 6 |
| Head dim | 64 |
| Context length | 256 characters |
| Vocabulary | 38 characters (a-z, space, punctuation) |
| Dropout | 0.1 |
| Weight tying | No (separate lm_head) |
| Framework | Julia + Flux.jl |
Vocabulary
38 characters: !"'(),-.:;?abcdefghijklmnopqrstuvwxyz
Character-level tokenization with no BPE โ each character is one token.
Training
| Value | |
|---|---|
| Dataset | Classical philosophy corpus |
| Training steps | 14,739 |
| Best val loss | 2.91 |
| Hardware | NVIDIA RTX 3060 12GB |
| Precision | Float32 |
Inference Settings
| Parameter | Value |
|---|---|
| vocab_size | 38 |
| context_length | 256 |
| temperature | 0.8 |
| top_k | 40 |
Checkpoint Format
JLD2 files containing:
model_stateโ Flux model weightshyperparamsโDict("n_embd"=>384, "n_layer"=>6, "n_head"=>6, "vocab_size"=>38, "block_size"=>256, "dropout"=>0.1)stepโ 14,739best_val_lossโ 2.91
Files
| File | Description |
|---|---|
final_model.jld2 |
Final training checkpoint |
best_model.jld2 |
Best validation loss checkpoint |
checkpoint_latest.jld2 |
Latest periodic checkpoint |
vocab.json |
Character vocabulary (38 chars) |
Provenance
- Author: LisaMegaWatts
- Source code: DavinciDreams/JuliaGPT
License
MIT
Dataset used to train LisaMegaWatts/JuliaGPT-v2
Space using LisaMegaWatts/JuliaGPT-v2 1
Evaluation results
- Val Loss on philosophy-corpusself-reported2.910