1.9 Connections

Wirings between Transformer inputs and outputs.

Transformer-connections

Diagram 1.0: The Transformer, Vaswani et al. (2017)

An initial glimpse of the Transformer may raise questions regarding why both encoder and decoder have inputs. This is not made entirely clear in the original Vaswani et al. (2017) paper. The following diagrams depict why - the decoder blocks take as input the encoded initial input sequence, as well as the generated tokens that the Transformer itself has generated, and continue to feed newly generated tokens back into the decoder until the output sequence is finished.

Note also that Transformers are typically stacked, and the logic behind this is akin to adding more layers to a neural network; it is .

Diagram depicting connections between stacked encoder blocks, stacked decoder blocks, and outputs and inputs between each Diagram 1.8.1: an LLM performing a machine translation task, note that this model features stacked encoders and decoders, not uncommonly. Source: HuggingFace blog

Animation of input sequence and previously generated tokens going into a black box, and generating the next token, and repeating until all tokens have been generated.

Diagram 1.8.2: an LLM performing a chatbot style task, to depict, with the Transformer as a black box, how newly generated tokens are fed as input back into the Transformer. Source: Illustrated Guide to Transformers Neural Network: A step by step explanation

1.8 Softmax