Page 1 of 1

Simple Transformer example

Posted: Tue Mar 31, 2026 8:03 am
by yossik32
Tiny Transformer-Style Attention Example

What this example teaches :

This tiny demo shows the core transformer idea:

1. Each earlier word has a key

This helps decide whether the word is important.

2. Each earlier word also has a value

This carries useful meaning.

3. The current position creates a query

This asks:

“Which earlier word matters most right now?”

4. Attention weights are computed

The model gives more weight to the useful token.

5. The weighted information is combined

That combined result helps make a prediction.