Educational Simulation

Transformer Explainer

Model: GPT-2 Small (simulation) Layer: 12 (simplified visual pass) Heads: Causal Mask: Active

Attention Core

Q/K/V + masked self-attention map

Attention weights for active query token

Vector values on selected head

Autoregressive sampling from the latest token