JuliaGPT-v2

A ~10M parameter character-level GPT trained on classical philosophy texts. Scaled-up successor to the original JuliaGPT (8K params), using the same 38-character vocabulary but with a much larger architecture.

Model Lineage

Model Params Architecture Vocab Val Loss
MicroJulia 4,992 1L/16d/4H, block=64 27 chars 2.43
JuliaGPT 8,096 1L/16d/4H, block=256 29 chars 2.34
JuliaGPT-v2 ~10M 6L/384d/6H, block=256 38 chars 2.91

Architecture

GPT (GPT-2 style, scaled)
+-- wte: Embedding(38 -> 384)
+-- wpe: Embedding(256 -> 384)        [learned position embeddings]
+-- blocks x 6:
|   +-- attn: CausalSelfAttention
|   |   +-- wq: Dense(384 -> 384)     [6 heads, 64 dim each]
|   |   +-- wk: Dense(384 -> 384)
|   |   +-- wv: Dense(384 -> 384)
|   |   +-- wo: Dense(384 -> 384)
|   +-- ffwd: FeedForward
|       +-- Dense(384 -> 1536)
|       +-- Dense(1536 -> 384)
+-- lm_head: Dense(384 -> 38)

Model Details

Parameter Value
Architecture GPT-2 style Transformer
Parameters ~10M
Embedding dim 384
Layers 6
Attention heads 6
Head dim 64
Context length 256 characters
Vocabulary 38 characters (a-z, space, punctuation)
Dropout 0.1
Weight tying No (separate lm_head)
Framework Julia + Flux.jl

Vocabulary

38 characters: !"'(),-.:;?abcdefghijklmnopqrstuvwxyz

Character-level tokenization with no BPE โ€” each character is one token.

Training

Value
Dataset Classical philosophy corpus
Training steps 14,739
Best val loss 2.91
Hardware NVIDIA RTX 3060 12GB
Precision Float32

Inference Settings

Parameter Value
vocab_size 38
context_length 256
temperature 0.8
top_k 40

Checkpoint Format

JLD2 files containing:

  • model_state โ€” Flux model weights
  • hyperparams โ€” Dict("n_embd"=>384, "n_layer"=>6, "n_head"=>6, "vocab_size"=>38, "block_size"=>256, "dropout"=>0.1)
  • step โ€” 14,739
  • best_val_loss โ€” 2.91

Files

File Description
final_model.jld2 Final training checkpoint
best_model.jld2 Best validation loss checkpoint
checkpoint_latest.jld2 Latest periodic checkpoint
vocab.json Character vocabulary (38 chars)

Provenance

License

MIT

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Dataset used to train LisaMegaWatts/JuliaGPT-v2

Space using LisaMegaWatts/JuliaGPT-v2 1

Evaluation results