Summary
Computer vision is built on the analysis and manipulation of numeric pixel values in images. Machine learning models are trained using a large volume of images to enable common computer vision scenarios, such as image classification, object detection, automated image tagging, optical character recognition, and others.
While you can create your own machine learning models for computer vision, today's foundation models can use to analyze images, including generating a descriptive caption, extracting relevant tags, identifying objects, and others.