LoRA is one of the most used parameter-efficient fine-tuning (PEFT) methods for large language models (LLMs). However, there remains a performance gap between LoRA and full fine-tuning. Previous works proposed to improve LoRA in various ways to make it more memory-efficient and accurate: DoRA, LoftQ, VeRA, etc.
In an article for The Kaitchup, I showed that it can be difficult to confirm the potential advantages of these alternatives. For instance, I didn’t see any difference between DoRA and LoRA.
We still don’t have a PEFT method clearly matching the performance of full fine-tuning.
MoRA is yet another original method aiming at closing this performance gap. This PEFT method is original as it fine-tunes a high-rank adapter. According to its authors, MoRA outperforms LoRA on several tasks such as continual pre-training and instruct fine-tuning.
In this article, I review MoRA. We will see how MoRA can fine-tune a high-rank adapter on top of an LLM with a number of trainable parameters similar to LoRA. Then, we will try MoRA with a quantized Llama 3 8B, i.e., QMoRA, to check its performance.
I made a notebook showing how to fine-tune Llama 3 with MoRA: