2 Comments

Great article Ben! One thing would like to add for scaling is quantization. I think that with pruning is going to determine how can we better scale large language models more cost effectively.

Expand full comment
author

Totally agree with you. Model compression (pruning, quantization, etc.) is an important technique for scaling LLM applications.

Expand full comment