I originally wrote this as a LinkedIn post while exploring parameter-efficient fine-tuning methods. I’m sharing it here for reference and plan to revisit this topic with a deeper, more detailed blog post in the future.
Over the past few months, during my research in Machine Learning and Security, I kept running into the term LoRA (Low Rank Adaptation).
At first, I brushed it off as just another term and only looked at the high-level idea. But as I started reading more papers and designing my own experiments, I realised how important this concept actually is, which pushed me to take a deeper look.
Here is the basic intuition.
Most modern models begin with pretraining. A large model is trained on massive and diverse data, like large parts of the internet or huge image datasets, to learn general patterns. This pretrained model then acts as a base model.
To adapt this base model to a specific task, we usually rely on fine-tuning. Fine-tuning means updating all the model’s weights using task-specific data. While this works well, it comes with some clear drawbacks:
- Updating all weights requires massive GPU memory and storage
- Fine-tuning for every new task means repeating the entire expensive process
This is where LoRA becomes especially useful. Instead of modifying the entire weight matrix, LoRA adds a small, trainable update to it. Conceptually, this looks like:
W_new = W + ΔW
ΔW = A · B
Here, A and B are low-rank matrices added to a specific layer. During training, only these matrices are updated, while the original weights remain frozen. During inference, the layer simply uses the combined weights (W_new).
Why is this useful?
- We only train a small number of parameters, making training far more efficient
- For a new task, we can swap out A · B with another set for that task, without touching the base model
This makes adapting large models significantly cheaper and more scalable.
This is just a short summary, but understanding LoRA and the ideas that came before it really changed how I think about model adaptation.
To know more about the prior work and why it works, do refer to the original LoRA paper from Microsoft Research!
Below is an image I took from One Minute NLP.
Source: One Minute NLP - LoRA visualisation
Thanks for reading!