Educational Simulation

Transformer Explainer

Model: GPT-2 Small (simulation) Layer: 12 (simplified visual pass) Heads: Causal Mask: Active

Attention Core

Q/K/V + masked self-attention map

Input Attention MLP Probabilities Data visualization empowers users to visualize Q K V Masked Dot-Product visualize 54.7% create 20.8% see 12.1% make 6.3%

Query Token Details

Attention weights for active query token

Q/K/V Snapshot

Vector values on selected head

Generated Continuation

Autoregressive sampling from the latest token