Zero-Shot Learning (ZSL) is a machine learning paradigm where a model is trained to recognize or perform tasks on classes or concepts it has not seen during training. Instead of relying on direct examples of unseen classes, ZSL models typically leverage auxiliary information that describes these unseen classes, often in the form of attributes, textual descriptions, or embeddings from other modalities.
In a multimodal context, ZSL often involves transferring knowledge from a rich modality (like text, where class descriptions are available) to another modality (like vision) to classify images of unseen objects, or to generate content across modalities for unseen concepts.