Researchers at Meta and the University of Southern California have released a new deep learning model called Megalodon, which can be a replacement for the Transformer, the architecture used in large language models (LLM).
Megalodon solves the quadratic complexity problem of Transformers. The memory and compute requirements of transformer models quadruple every time the size of their input doubles. This makes it difficult to scale them to very large context windows. Megalodon uses a different attention mechanism that has linear complexity and can be scaled at much lower costs.
Megalodon builds on top of MEGA, an architecture that was first introduced in 2022. It makes modifications to MEGA to bring its performance on par with the Transformer block and also make its training and inference much more efficient.
The experiments run by the researchers show that a 7-billion-parameter outperforms Llama-2-7B and matches Llama-2-13B on several tasks. The experiments on long context modeling are also promising, with Megalodon showing great results on text that is millions of tokens long.
The good thing about the project is that it has been open-sourced and is available on GitHub, which means you can try it out and test it for yourself.
Learn more about Megalodon on VentureBeat.
Read the paper on Arxiv.
The amount we've heard about transformers recently... people will lose their minds if this takes off...
Thanks for the early warning here Ben!