MoRA is a new alternative to LoRA

MoRA replaces low-rank matrices with a square matrix that gives fine-tuned LLMs higher capacity for learning new knowledge

May 29, 2024

Parameter-efficient fine-tuning (PEFT) techniques have become very popular in the recent year because the significantly reduce the costs of customizing large language models (LLM).

One widely used PEFT method is “low-rank adaptation” (LoRA), which uses low-rank matrices to map the high-dimensional space of the LLM’s parameters to a low-dimensional space that only modifies the parameters that need to be modified for the downstream task. Experiments show that LoRA can fine-tune LLMs for new tasks with a very small fraction of the costs of full fine-tuning (FFT), which modifies all the parameters in the model.

However, while LoRA is good for behavioral fine-tuning tasks such as instruction-following and alignment, it is not very efficient for tasks that require the model to acquire new knowledge or perform complicated reasoning.

LoRA vs MoRA — *LoRA (left) uses low-rank matrices while MoRA (right) uses a single square matrix for parameter-efficient fine-tuning (source: arxiv)*

MoRA, a new PEFT method developed by researchers at Microsoft and Beihang University, solves the main challenge of LoRA. It replaces the low-rank matrices with a single square matrix that has the same number of parameters. This adjustment gives the model a higher capacity to learn new knowledge while preserving the low-parameter efficiency of LoRA.

MoRA adds a special compression/decompression function to overcome the gap between the dimensions of the model and the low-rank model. This compression algorithm makes sure to keep all the important information required for the fine-tuning task.

Experiments show that MoRA performs nearly on par with FFT on fine-tuning tasks that involve new knowledge and require the memorization of new knowledge. At the same time, MoRA performs as well as LoRA on tasks such as instruction fine-tuning.

The researchers have released the code for MoRA, and it is compatible with existing LoRA libraries, which means you can use it as a drop-in replacement if you’re already using LoRA.

MoRA training curve — MoRA (blue lines) are much closer to full fine-tuning on memorization tasks.

Fine-tuning is a big use case for enterprise LLM applications. Being able to customize LLMs with company-specific data can reduce costs by enabling organizations to use smaller models for tasks that previously required API calls to expensive frontier models.

It remains to be seen how successful MoRA will be in applied settings and if it will be integrated into existing LLM fine-tuning and serving platforms.

TechTalks

Discussion about this post