Video understanding, also known as video analysis or video intelligence, is a subfield of computer vision and artificial intelligence focused on enabling machines to comprehend the content of videos. Unlike static image analysis, video understanding involves processing and interpreting spatio-temporal data – sequences of frames that change over time – to recognize actions, activities, objects, scenes, and their interactions.
The goal is to extract meaningful information from video streams, similar to how humans perceive and interpret them. This includes not just what is happening (actions, events) but also who is involved, where it's happening, and potentially why.