Educational Simulation
Transformer Explainer
Model: GPT-2 Small (simulation)
Layer: 12 (simplified visual pass)
Heads:
Causal Mask: Active
Attention Core
Q/K/V + masked self-attention map
Query Token Details
Attention weights for active query token
Q/K/V Snapshot
Vector values on selected head
Generated Continuation
Autoregressive sampling from the latest token