LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning (PEFT) technique. Instead of updating all weights of a pre-trained Large Language Model (LLM), LoRA freezes the original weights and injects trainable, low-rank matrices into specific layers (typically attention layers) of the Transformer architecture. The core idea is that the change in weights (Delta W) needed for adaptation can be approximated by a low-rank matrix, i.e., Delta W = BA, where A and B are much smaller than W.
This significantly reduces the number of trainable parameters, making fine-tuning more memory-efficient and faster, while often achieving performance comparable to full fine-tuning.