5 Comments

Thank you for the deep dive. Content like this inspires me to revisit the computer engineering basics and to not forget the importance of understanding the hardware platform. It's easy to lose sight of it when spending all the time up the stack, only worrying about shipping features.

Expand full comment

The point of "Some scaling challenges can be reduced to solving a math problem" reminds me of how Meta optimized their serverless platform XFaaS. Pretty cool to see less common/talked about efficiency habits come up recently when it comes to scaling platforms.

Expand full comment

Fantastic article. Perfect balance of breadth and depth IMO.

Expand full comment

I think "If you want the model to predict the 1,000th token, it needs to do about 1 million operations" is wrong. Generating the 1,000th token requires 1,000 operations. However, generating all 1,000 tokens leading to it is what's quadratic. So it should be "If you want the model to predict *1,000 tokens*, it needs to do about 1 million operations"

Expand full comment

A good piece. Why OpenAI Triton is not mentioned here ?

Expand full comment