Researchers at Meta and the University of Southern California have released a new deep learning model called Megalodon, which can be a replacement for the Transformer, the architecture used in large language models (LLM). Megalodon solves the quadratic complexity problem of Transformers. The memory and compute requirements of transformer models quadruple every time the size of their input doubles. This makes it difficult to scale them to very large context windows. Megalodon uses a different attention mechanism that has linear complexity and can be scaled at much lower costs.
The amount we've heard about transformers recently... people will lose their minds if this takes off...
Thanks for the early warning here Ben!