
World Model
Juan Vera
June 2025
World Model
The world model is a space-time factorized transformer with a hidden dimension of .
Where a factorized transformer is a transformer that performs attention over different dimensions of the input, in this case being over and .
Let
be the set of input latent vectors in the form of a tensor, where
- is the temporal window or the number of frames in the input,
- is the number of cameras,
- is the height of the frame,
- is the width of the frame,
- is the dimension of the individual latent vector.
Assume , then we simplify: