Udemy | Introductory Computer Vision Course | Taxonomy
Taxonomy Of Computer Vision
Reading Resources
Difference Between Vision and Graphics
Computer vision focuses on interpreting and understanding images or video to extract meaningful information, essentially making sense of what a camera “sees.”
In contrast, computer graphics deals with generating visual content, creating images or 3D models from abstract data.
- Vision → From real-world to data; builds on Images via Processing to understand Geometry and Photometry
- Graphics → From data to real-world-like representations; grasps insights from Geometry and Photometry to create images
Graphics is about creation, whereas vision is about understanding.
Relationship between Images, Geometry and Photometry
- Images (2D) provide 2D data for processing.
- Geometry (3D) aids in understanding 3D or spatial structures from 2D inputs.
- Photometry (light and color properties) studies light interaction, critical for tasks like texture mapping or scene rendering.
Images are 2D limiting their ability to represent depth.
Sampling and Aliasing:
- Sampling: Converts continuous visual data into discrete form (e.g., pixel grids).
- Aliasing: Undesired distortions caused by insufficient sampling resolution, leading to artifacts like jagged edges in images.
Image Processing:
Image processing transforms images for tasks like noise reduction, contrast enhancement, and edge detection. It serves as the first step in most vision pipelines.
Segmentation:
Segmentation divides an image into meaningful regions, such as separating a foreground object from its background. It is crucial for object detection and recognition.
Computational Photography:
This area combines computer vision and graphics to enhance or create images, such as High Dynamic Range (HDR) imaging or panoramas.
Recognition:
Recognition tasks identify objects, faces, or patterns in images. It involves training machine learning models to classify and label visual content.
Geometry is 3D
Geometric Image Formation:
Focuses on how 3D objects are projected into 2D images using principles like perspective and camera models.
Feature Based Alignment:
Aligns images by detecting and matching features (e.g., corners or edges) between frames. Common in stitching and motion tracking.
Structure From Motion:
Determines the 3D structure of a scene from multiple 2D images captured from different viewpoints.
Stereo Correspondence:
Uses images from two slightly different perspectives to determine depth by matching corresponding points.
Shape Recovery:
Infers the 3D shape of an object using geometric and photometric clues, often for object modeling or augmented reality.
Photometry
Photometric Image Formation:
Explains how light interacts with objects, affecting color, brightness, and shading in images.
Texture Recovery:
Reconstructs surface textures from images, used in applications like 3D modeling and game design.
Things that float between 2D and 3D
Feature Detection:
Identifies key points or regions in images, like corners, blobs, or edges, which are crucial for tasks like alignment and tracking.
Motion Estimation:
Estimates the movement of objects or the camera itself by analyzing sequential frames. Applications include video stabilization and object tracking.
Stitching:
Combines overlapping images to create a seamless larger image or panorama, commonly used in computational photography.
Image Based Rendering:
Creates realistic images or videos by manipulating existing images rather than modeling scenes from scratch, bridging the gap between graphics and vision.
Hope this helps in understanding the various terms commonly used in Computer Vision. Thank you for the read.