- scaling
- ml
- reflections
•
•
-
Grokking fast and slow
Whats, whys, and what-ifs of grokking
-
Celebration is the secret
Inspiration from someone I look up to
-
Ultra-Scale Playbook vol-3 - DeepSpeed ZeRO
On the three kinds of ZeRO used with Data Parallelism
-
Ultra-Scale Playbook vol-2 - Data Parallelism
Parallelising data batches across GPUs with Data Parallelism
-
Ultra-Scale Playbook vol-1 - Single GPU
Kick-off to a series of notes on LLM scaling