World Model
June 2025
Robotic Control via Embodied Chain-of-Thought Reaosning
April 2025
Swarm of Attention Variants
April 2025
RoPE
April 2025
Re-Mix: Optimizing Data Mixture for Large Scale Imitation Learning
April 2025
(Q?)KV Cache
April 2025
Tiny Stories
April 2025
VolGAN
March 2025
Generative Adverserial Networks
March 2025
Understanding the Implied Volatility Surface
March 2025
GRPO
February 2025
The Inner Mechanism of Byte Pair Encoding
January 2025
Improving Language Understandingby Generative Pre-Training
January 2025
Transformers
January 2025
Attention
January 2025
Neural Scaling Laws
January 2025
Mathematics of BPTT
January 2025
Residuals
November 2024
Backprop through Convolutions
October 2024